Next: Case study: Language Identification
Up: Probability and Language Models
Previous: Medical diagnosis:
For language modelling, an especially useful form
of the conditional probability equivalence:

is:

and this can be applied repeatedly to give:
This is nice because it shows how to make an estimate of the
probability of the whole string from contributions of the
individual words. It also points up the possibility
of approximations. A particularly simple one is to
assume that the contexts don't affect the individual word
probabilities:
We can get esimates of
the P(wk) terms from frequencies of words. This is just
word confetti in another form. It misses out the same crucial facts
about patterns of usage. But the following , which restricts context
to
a single word, is much more realistic:
This is called a bigram model. It gets much closer to
reality than does word-confetti, because it takes limited account
of the relationships between successive words. The next section
describes an application of such a model to the task of language
identification.
Next: Case study: Language Identification
Up: Probability and Language Models
Previous: Medical diagnosis:
Chris Brew
8/7/1998