Sparse Distributed Memory

I put this on my website a long time ago, maybe around 1999, as an HTML page. This is it moved to my blog.

Contents

Introduction

Pentti Kanerva’s Sparse Distributed Memory model is based on simple mathematical properties. It can be used to store and recall large amounts of data efficiently, without requiring that the data be completely accurate or that we know exactly what we need to recall.

Though it may not be an exact model of human memory, it shares enough characteristics to suggest that human memory works in a similar way.

n-Dimensional Binary Space

The model deals with data in its binary form, without regard to what that data represents. All data can be stored as binary, although some translation format may need to be chosen.

In theory each piece of data is simply one value in the set of possible values that could have been expected. Or one point in the space of possible data points. This space is very simple if the data is binary.  The space for a value which is n bits long would have n dimensions, with only two possible points (0 and 1) in each dimension.

For a 2-bit value, imagine a 2 by 2 square. For a 3-bit value, imagine a 2 by 2 by 2 cube. There is no need to visualise values with more bits because the theory is so simple. In fact you may not find it useful to think in terms of a space at all.

The distance between two points

The model needs to calculate how much one value is like another. Continuing with the ‘space of points’ idea, we can calculate the difference between two points just as we would for a 2 or 3 dimensional space, using Pythagoras’ theorem. However, for such a simple space Pythagoras seems like overkill so we can use an approximation called the ‘Hamming distance’.

The Hamming distance between two binary values is measured by counting the number of bits which are different. This turns out to be an adequate approximation.

Sparseness

Most locations in a circle are near the edge of the circle. This means that each location is far from most of the other locations. This effect is enhanced when dealing with the binary values, and the Hamming distance. This is demonstrated mathematically and by example in Kanerva’s book.

A value is ‘indifferent’ to another if they differ by n/2 or more bits. Each value in the binary space is indifferent or very nearly indifferent to the vast majority (about 99 percent when n is large) of other values. This characteristic of the space is crucial to Kanerva’s model, because it allows a nearly-correct location to be much closer to the correct location than to the wrong locations.

Distributed

When a value is written to a location (an address), the value is written into every location in the circle about that location. Likewise, when a value is read from a location it is calculated by examining the contents of every location in the circle about that location. A good approximation of the best match can be calculated by averaging each bit individually. This should not require much more computation than is already required to read the contents of every location in the circle.

Therefore, each address is involved in the storage of many values, and many addresses are involved in the storage of any one value. This is why the memory is ‘Distributed’. Because there is no need for the locations to be physically adjacent, this should make the memory resistant to local damage.

The more often a value is stored, the easier it will be to recall. This also allows older memories to fade gradually.

Hard Locations

An n-dimensional binary space has 2n points. When dealing with large values, for instance 1000-bit values, there are far too many points for any computer (or brain) to keep track of them all. Kanerva found that, when he ignored most of the points, the distributed nature of the space meant the model still had the same properties. The points which remain are called ‘hard locations’. Of course we lose some accuracy but it is worth it to make the model possible.

Using the value as the address

When values are used as the address where they are to be stored, it becomes possible to home-in on a stored value starting from an inaccurate (or incomplete) version of that value. The value read from the inaccurate value-address will be a more accurate version of that value-address because the value was originally written into all addresses in the circle.

If this process is repeated then the address read will converge on the original stored value. If the iterations don’t converge then either the value wasn’t stored, or we have insufficient information.

This is probably what makes the model seem most like human memory – It can recall information even when it isn’t sure what it should recall.

Further Reading

Sparse Distributed Memory, Pentti Kanerva

Artificial Minds, Stan Franklin – Mentions the theory briefly.

Information Theory – Deals, in part, with storing information as binary data.