# Information Theory

I put this on my website a long time ago, maybe around 1997, as an HTML page. This is it moved to my blog.

### Introduction

Information is a property of data. A piece of data holds more information if its content is less expected. ‘Man bites dog’ contains more information than ‘Dog bites man’

The arrival of each new piece of data is an event. Intuitively, if the event is certain then it provides no information. If it impossible then it provides infinite information. We may represent the Information numerically by using the equation I = log(1/p), sometimes written as I = -log(p), where p is the probability of an event occurring and I is the information provided by that event. This equation satisfies our intuitive ideas about information by providing a value of zero for a certain event and infinity for an impossible event. The value I will never be negative. The base of the logarithm is chosen arbitrarily.

### Bits as units of Information

When using logarithms of base 2 to calculate Information, e.g. I = log2(1/p), a value of 1 for I indicates that the event provides enough information to answer a simple yes/no question. There are obvious similarities with the binary system of 1s and 0s. Therefore telecommunications and computer scientists often use base 2 logarithms and refer to each unit of Information as a bit.

In theory any item of information could be conveyed by answering the correct series of yes/no questions. An efficient use of binary storage therefore asks the smallest necessary number of such yes/no questions.

### Information in a system (Entropy)

The amount of information in a system is a measure of the number of possible states which it may have. A more disorganised system with more possible states has greater information and is said to have greater Entropy. Systems tend towards greater entropy, thus becoming more disorganised. The classic example is that of a volume of gas which tends to maximise its entropy.

Note that the amount of entropy in the universe can only increase. A system can only become more organised at the expense of increased disorder elsewhere, generally as a dissipation of heat due to work.

### Information capacity

The information capacity of a data store is a measure of how many different states it can be in. For instance, an 8-bit byte can store 8 1s or 0s, in 256 possible combinations. 8 = log2(1/(1/256)) = log2(256). Note that some amount of power is always required to maintain the integrity of any data store because, like any system, it will tend towards disorder.

Similarly, the information capacity of a communications channel is a measure of how many states it can be in during a given time period, stated in bits per second. This is a theoretical maximum capacity which depends on the physical properties of the channel rather than the particular method of coding the data. In theory the channel would actually convey information at the maximum capacity if the data was coded in the most compacted form possible.

### Signal to noise ratio

Information theory matured in the field of telecommunications, where all communications channels contain some amount of useless noise.

The term is now often used slightly differently to refer to how compactly a message expresses its information. The English language has a high signal-to-noise ratio because in theory many letters and words could be omitted without the reader understanding less.