Next: Hidden Markov Models and
Up: Probability and information
Previous: Summary and self-check
- 1.
- Exercise:
-
How many different single character sequences are
there in English text?
- Solution:
-
- There are various sensible answers to this.
- 2.
- Exercise:
-
How many different two character sequences are
there in English text?
- Solution:
-
Assume we said 76 for the previous question. There
are the same number of choices for the second character,
so there are
possibilities. Or are there?
What about the possibility that some sequences of characters don't
occur? For example ``sb'' is either rare or impossible (but
not in Italian). In fact there are only 1833 distinct two
character sequences in my extract. What does this mean?
- 3.
- Exercise:
-
How many different syllables are
there in English?
- Solution:
-
A rough cut is to assume that English syllables are of the
form
C?C?VCC??
If we assumed that there are roughly 10 distinct vowels and
20 distinct consonants, and assume
that we have a free choice at all times
then we get an upper bound of about
possible syllables. Typical syllabic writing systems have
50-200 distinct signs (Japanese, which has a particularly
simple syllabary (nearly
all open syllables like ``ma'' ``ka'' ``no'') makes do with
48. Clearly the assumption of independence is unwarranted
in this case.
- 4.
- Exercise:
-
How many different words are
there in English?
- Solution:
-
There may not be a good way of answering this, but it
is worth thinking about. One way is to go through
the same sort of argument that I just did with syllables,
assuming few words longer than five syllables, or something.
- 5.
- Exercise:
-
This question is about ``identical'' twins. It isn't always
possible to tell by inspection whether twins are monozygotic or
dizygotic
. But monozygotic twins are always of the
same sex. Derive a formula for the proportion
of twins which are monozygotic from sex-ratio data alone. (borrowed
from ``Bayesian Statistics'' by Peter M. Lee).
- Solution:
-
Each pair of twins is either monozygotic M, or dizygotic
D, and either two girls GG, two boys BB, or a girl and a boy
GB.

from which you can deduce that
and thence that
P(M) = 4P(GG)-1
It's worth pointing out that if you were unlucky with your sample
(your provider of twins works at a single-sex boys school) you would
get a strange estimate of P(GG) and this could feed through into
making your estimate of P(M) not only wrong but (being negative)
nonsensical.
Next: Hidden Markov Models and
Up: Probability and information
Previous: Summary and self-check
Chris Brew
8/7/1998