Next: Medical diagnosis:
Up: Events and probabilities
Previous: Conditional probabilities and independence:
Furthermore, because
P(sherlockn-1,holmesn) means exactly the same thing as
P(holmesn,sherlockn-1) it follows that:

This equivalence works for any pair of words, in the form:

You can then divide through by P(wn) to get the usual form of
Bayes' rule. This is:

At first sight, all this algebra looks circular, because
it only tells you how to calculate one probability on the basis of
another which looks nearly identical. To understand the reason why
this isn't always so, it's best to step aside from linguistics for
a moment and consider an example from medicine.
Chris Brew
8/7/1998