Next: Bayesian Decision Rules
Up: Case study: Language Identification
Previous: Common words
A Markov model is a random process in which the transition
probability to the next state depends solely on the previous
state, as in the following:

This is another form of the bigram model which we saw earlier.
You can view any process where the output probability distribution
depends only on the last k states as a Markov process by
regarding the labels of the last k states of the small machine
as labels of corresponding states in a much bigger machine, as
shown in figure 8.2.
Figure 8.2:
Relabelling a Markov process
 |
The relabelled model has transitions:

These larger models are called kth order Markov models. For
language identification 4th order Markov models are quite
effective (you take account of four characters of context).
Given 6000 words
of data intensive linguistics research report, you get the
results shown in figure 8.3.
Figure 8.3:
Randomly generated strings from several orders of Markov
model
 |
The numbers down the side are the order of the models,
0 is word-confetti, 1 is bigrams, more than that is higher-order.
This is not what we want, because it is brittle. You
get verbatim reproduction, and texts which are different from the
training material often get zero probability. To fix this you either
need lots of training data or a good technique for smoothing
the probabilities. We'll come to that in section 8.3,
but for the moment we'll
note the problem and move on to formalising the task of deciding which
language a particular string came from
Next: Bayesian Decision Rules
Up: Case study: Language Identification
Previous: Common words
Chris Brew
8/7/1998