Next: Example
Up: Converting between the two
Previous: Rabiner to Charniak
The reverse process is harder. In general it is impossible to
map a probabilistic finite state transducer into a balls and urns
model while preserving the number of states, because latter
model is significantly more constrained. However, it remains possible
to interconvert if one allows the addition of new
states.
As a simple example we use the network which is given on p64 of
Charniak, which is shown in figure 10.3.
Figure 10.3:
The HMM from Charniak p 64
 |
The transformation which we use is a two stage process. The first
stage introduces an extra node for every pair of nodes that
are connected in the original graph, and shuffles the probabilities
such that the new nodes are of the form expected by Rabiner's model.
The graph which is produced by this first stage includes two
types of nodes:
- 1.
- Those with emission probabilities
- 2.
- Those without emission probabilities
.
The second stage therefore has the task of eliminating the
nodes which do not have emission probabilities, and the final
product is a network in Rabiner's format, but with a larger
number of nodes than the original.
Figure 10.4:
First transformation of figure 10.3
 |
The result of the process is shown
in figure 10.4
In this network there are non-zero transition probabilities from state
a
to state a
or state b
, and from state b
to state a
, so we introduce
new nodes labelled a->a
,a->b
,b->a
. The transition
probability to a new state is the sum of all the transitions which
lead from its source state to its target state, hence the transition
matrix is as in table 10.3. Emission probabilities are
assigned
at each new state in accordance with the proportions of the
transitions
to that state which bear the relevent symbols, thus at state a->a
we assign an emission probability of 0.16/0.33 to and
0.17/0.33
to 1.
Table:
The transition matrix of figure
state |
a |
b |
a->a |
a->b |
b->a |
|
transition: |
0.0 |
0.0 |
1.0 |
0.0 |
1.0 |
goes to a |
|
0.0 |
0.0 |
0.0 |
0.0 |
0.0 |
goes to b |
|
0.33 |
0.0 |
0.0 |
1.0 |
0.0 |
goes to a->a |
|
0.67 |
0.0 |
0.0 |
0.0 |
0.0 |
goes to a->b |
|
0.0 |
1.0 |
0.0 |
0.0 |
0.0 |
goes to b->a |
This HMM lacks emission probabilities for nodes a
and b
,
but a more important fact is that these nodes can be eliminated,
producing
a more compact network. For every pair of transitions
(tij,tjk) such that j is an old state and i and k are
new states, we introduce a direct transition tik and let its
probability
the product of the probabilities of tij and tjk.
This gives the transition probabilities in table
. The
initial probabilities are given by
where j is an new state and i is an old state.
The emission probabilities on the new states are left
unchanged.
Table 10.4:
The Rabiner version of Charniak p64
 |
Figure 10.5:
Final transformation of figure 10.3
 |
The result of the process is shown
in figure 10.5
The HMMs used in the Xerox tagger have the same form as those used by
Rabiner, but
is taken to be the probability distribution of a
state hallucinated one before the first word of the sentence proper.
Next: Example
Up: Converting between the two
Previous: Rabiner to Charniak
Chris Brew
8/7/1998