Next: Why is the cross-entropy
Up: Probability and information
Previous: Entropy
In the previous section we developed the idea that entropy is
measure of the expected information gain from seeing the next
symbol of a ticker tape. The formula for this quantity, which
we called entropy is:

Now we imagine that we are still watching a ticker tape, whose
behaviour is still controlled by P(w) but we have imperfect
knowledge PM(w) of the probabilities. That is, when we see w
we assess our information gain as
, not as the
correct
. Over time we will see symbols occurring with
their true distribution, so our estimate of the information
content of the signal will be:

This quantity is called the cross-entropy of the signal
with respect to the model PM. It is a remarkable and
important fact that the cross entropy with respect to any
incorrect probabilistic model is greater than the entropy
with respect to the correct model.
The reason that this fact is important is that it provides us with
a justification for using cross-entropy as a tool for evaluating
models. This lets you organize the search for a good model
in the following way
- Initialize your model with random (or nearly random)
parameters.
- Measure the cross-entropy.
- Alter the model slightly (maybe improve it)
- Measure again, accepting the new model if the
cross-entropy has improved.
- Repeatedly alter the model until it is good enough
.
If you are able to find a scheme which guarantees that the
alterations to the model will improve cross-entropy, then so
much the better, but even if not every change is an improvement,
the algorithm may still eventually yield good models.
Next: Why is the cross-entropy
Up: Probability and information
Previous: Entropy
Chris Brew
8/7/1998