I put this on my website a long time ago, maybe around 1998, as an HTML page. This is it moved to my blog.

### Contents

### Introduction

Expert systems often calculate the probabilities of inter-dependent events by giving each parent event a weighting. Bayesian Belief Networks provide a mathematically correct and therefore more accurate method of measuring the effects of events on each other. The mathematics involved also allow us to calculate in both directions. So we can, for instance find out which event was the most likely cause of another.

### Bayesian Probability

#### Bayes’ Theorem

You are probably familiar with the following Product Rule of probability for independent events:

p(AB) = p(A) * p(B), where p(AB) means the probability of A and B happening.

This is actually a special case of the following **Product Rule** for dependent events, where p(A | B) means the probability of A given that B has already occurred:

p(AB) = p(A) * p(B | A)

p(AB) = p(B) * p(A | B)

So because: p(A) p(B | A) = p(B) p(A | B)

We have: **p(A | B) = ( p(A) * p(B | A) ) / p(B) **which is the simpler version of **Bayes’ Theorem**.

This equation gives us the probability of A happening given that B has happened, calculated in terms of other probabilities which me may know.

Note that: p(B) = p(B | A) * p(A) + p(B | ~A) * P(~A)

#### Chaining Bayes’ Theorem

We may wish to calculate p(AB) given that a third event, I, has happened. This is written p(AB | I). We can use the **Product Rule**: P(A,B) = p(A|B) p(B)

p(AB | I) = p(A | I) * p(B | AI)

p(AB | I) = p(B | I) * p(A | BI)

so we have:** p(A | BI) = ( p(A | I) * p(B | AI) ) / p(B | I)** which is another version of Bayes’ Theorem.

This gives us the probability of A happening given that B and I have happened.

This is often quoted as **p(H | EI) = ( p(H | I) * p(E | HI) ) / p(E | I)**, where p(H | EI) is the probablity of Hypothesis H given Evidence E in Context I.

By using the product rule we can chain several probabilities together. For instance, to find the probability of H given that E_{1}, E_{2} and I have happened:

p(H | E_{1}E_{2}I) = ( p(H | I) * p(E_{1}E_{2} | HI) ) / p(E_{1}E_{2}| I)

and to find the probability of H given that E_{1}, E_{2}, E_{3} and I have happened:

**p(H | E _{1}E_{2}E_{3}I) = ( p(H | I) * p(E_{1}E_{2}E_{3 }| HI) ) / p(E_{1}E_{2}E_{3} | I)**

Note that p(E_{1}E_{2}E_{3} | I) = p(E_{1} | E_{2}E_{3}I) * p(E_{2}E_{3} | I) = p(E_{1} | E_{2}E_{3}I) * p(E_{2 }| E_{3}I) P(E_{3} | I), which can be used to calculate two of the values in the above equation.

#### An example of Bayes’ Theorem

**p(H | EI) = ( p(H | I) * p(E | HI) ) / p(E | I)
**p(H | EI) = ( p(H | I) * p(E | HI) ) / ( p(E | HI) * p(H | I) + p(E | ~HI) * p(~H | I) )

H is the Hypothesis ‘Guilty’,

E is an item of evidence,

I is the context.

p(H | EI) is the probability of the Hypothesis ‘Guilty’ being true, given the evidence in this context.

p(H | I) is the Prior Probability – the subjective probability of the Hypothesis regardless of the evidence.

p(E | HI) is the probability of the evidence being true given that the Hypothesis is true.

p(~H | I) = 1 – p(H | I).

p(E | ~HI) is the probability of the evidence given that the hypothesis is not true – this measures the chances of the evidence being caused by something other than the defendant’s guilt. If this is high then naturally the hypothesis will be unlikely.

#### Assuming Conditional Independence

If, given that I is true, E_{1} being true will not affect the probability of E_{2} being true, then a simpler version of the chained bayesian theorem is possible:

p(H | E_{1}E_{2}I) = ( p(H | I) * p(E_{1} | HI) ) * p(E_{2} | HI) ) / ( p(E_{1} | I) * p(E_{2} | I) )

This version makes it very easy to introduce new evidence into the situation. However, Conditional Independence is only true in some special situations.

#### Prior Probabilities

One characteristic of Bayes’ Theorem is p(H | I), which is the probability of the hypothesis in context I regardless of the evidence. This is referred to as the Prior Probability. It is generally very subjective and is therefore frowned upon. This is not a problem as long the prior probability plays a small role in the result. When the result is overly dependent on the prior probability more evidence should be considered.

#### Bayesian Belief Networks

A Bayesian Belief Network (BBN) defines various events, the dependencies between them, and the conditional probabilities involved in those dependencies. A BBN can use this information to calculate the probabilities of various possible causes being the actual cause of an event.

#### Setting up a BBN

For instance, if event C can be affected by events A and B:

We may know the following probabilities:

**A**:

True | False |

p(A) = 0.1 | p(~A) = 0.9 |

**B**:

True | False |

p(B) = 0.4 | p(~B) = 0.6 |

**C**: Note that when depencies converge, there may be several conditional probabilites to fill-in, though some can be calculated from others because the probabilities for each state should sum to 1.

A |
True |
False |
||

B |
True |
False |
True |
False |

True |
p(C | AB) = 0.8 |
p(C | A~B) = 0.6 |
p(C | ~AB) = 0.5 |
p(C | ~A~B) = 0.5 |

False |
p(~C | AB) = 0.2 |
p(~C | A~B) = 0.4 |
p(~C | ~AB) = 0.5 |
p(~C | ~A~B) = 0.5 |

#### Calculating Initialised probabilities

Using the known probabilities we may calculate the ‘**initialised**‘ probability of C, by summing the various combinations in which C is true, and breaking those probabilities down into known probabilities:

p(C) = | p(CAB) + p(C~AB) + p(CA~B) + p(C~A~B) |

= | p(C | AB) * p(AB) + p(C | ~AB) * p(~AB) + p(C | A~B) * p(A~B) + p(C | ~A~B) * p(~A~B) |

= | p(C | AB) * p(A) * p(B) + p(C | ~AB) * p(~A) * p(B) + p(C | A~B) * p(A) * p(~B) + p(C | ~A~B) * p(~A) * p(~B) |

= | 0.518 |

So as a result of the conditional probabilities, C has a 0.518 chance of being true in the absence of any other evidence.

#### Calculating Revised probabilities

If we know that C is true, we can calculate the ‘revised’ probabilities of A or B being true (and therefore the chances that they caused C to be true), by using Bayes Theorem with the initialised probability:

p(B | C) = | ( p( C | B) * p(B) ) / p(C) |

= | ( ( p(C | AB) * p(A) + p(C | ~AB) * p(~A) ) * p(B) ) / p(C) |

= | ( (0.8 * 0.1 + 0.5 * 0.9) * 0.4 ) / 0.518 |

= | 0.409 |

p(A | C) = | ( p( C | A) * p(A) ) / p(C) |

= | ( ( p(C | AB) * p(B) + p(C | A~B) * p(~B) ) * p(A) ) / p(C) |

= | ( (0.8 * 0.4 + 0.6 * 0.6) * 0.1 ) / 0.518 |

= | 0.131 |

So we could say that given C is true, B is more likely to be the cause than A.