Next: Estimating Model Parameters
Up: Bayesian Decision Rules
Previous: Bayesian Decision Rules
In the case of our medical decision problem, this comparison hinged
on the prior probabilities of the diseases, and we could
proceed analogously, asking
the client of the language identification
system for estimates of
and
. But for this decision problem
you don't know the priors, so you
just assume that English and Spanish are equally
likely. This is the assumption of uniform (or sometimes
uninformed) priors. Provided there are big differences in the
conditional probabilities, the decision is going to be insensitive
to the precise values of the priors.
Strictly, the probability of observing a particular test string S given a
Markov model like MSpanish or MEnglish is:

but for practical purposes it is just as good to drop the leading
term. Variations in this are going to be massively outweighed by
the contribution of the terms in the product.
You can rearrange the product by grouping together terms which
involve the same words (for example, pulling together all instances
of ``th''), to get [Need to spell this out in more detail, with an
example and a diagram].

where
is the number of times the k+1
gram occurs in the test string. NB. Dunning gets this formula
wrong, using a product instead of an exponent. The next one is right.
As is usually the case, when working with probabilities, taking
logarithms helps to keep the numbers stable. This gives:

We can compare these for different languages, and choose the language
model which is most likely to have generated the given string. If the
language models sufficiently reflect the languages, comparing the
models will get us the right conclusions about the languages.
The question remaining is that of getting reliable estimates of
the ps. And this is where statistical language modellers really
spend their lives. Everything up to now is common ground shared, in
one way or another, by almost all applications. What remains is
task-specific and crucially important to the success of the
enterprise.
Next: Estimating Model Parameters
Up: Bayesian Decision Rules
Previous: Bayesian Decision Rules
Chris Brew
8/7/1998