Charniak to Rabiner

Next: Example Up: Converting between the two Previous: Rabiner to Charniak

Charniak to Rabiner

The reverse process is harder. In general it is impossible to map a probabilistic finite state transducer into a balls and urns model while preserving the number of states, because latter model is significantly more constrained. However, it remains possible to interconvert if one allows the addition of new states.

As a simple example we use the network which is given on p64 of Charniak, which is shown in figure 10.3.

**Figure 10.3:** The HMM from Charniak p 64
$\begin{figure} \fbox { \mbox{\includegraphics{Figures/charniak-p64.ps}} }\end{figure}$

The transformation which we use is a two stage process. The first stage introduces an extra node for every pair of nodes that are connected in the original graph, and shuffles the probabilities such that the new nodes are of the form expected by Rabiner's model. The graph which is produced by this first stage includes two types of nodes:

1.: Those with emission probabilities
2.: Those without emission probabilities

. The second stage therefore has the task of eliminating the nodes which do not have emission probabilities, and the final product is a network in Rabiner's format, but with a larger number of nodes than the original.

**Figure 10.4:** First transformation of figure 10.3
$\begin{figure} \fbox { \mbox{\includegraphics{Figures/rabiner-p64.ps}} }\end{figure}$

The result of the process is shown in figure 10.4

In this network there are non-zero transition probabilities from state a to state a or state b, and from state b to state a, so we introduce new nodes labelled a->a,a->b,b->a. The transition probability to a new state is the sum of all the transitions which lead from its source state to its target state, hence the transition matrix is as in table 10.3. Emission probabilities are assigned at each new state in accordance with the proportions of the transitions to that state which bear the relevent symbols, thus at state a->a we assign an emission probability of 0.16/0.33 to and 0.17/0.33 to 1.

Table: The transition matrix of figure
state a b a->a a->b b->a

transition: 0.0 0.0 1.0 0.0 1.0 goes to a

0.0 0.0 0.0 0.0 0.0 goes to b

0.33 0.0 0.0 1.0 0.0 goes to a->a

0.67 0.0 0.0 0.0 0.0 goes to a->b

0.0 1.0 0.0 0.0 0.0 goes to b->a

**Table:** The transition matrix of figure
state	a	b	a->a	a->b	b->a
transition:	0.0	0.0	1.0	0.0	1.0	goes to a
	0.0	0.0	0.0	0.0	0.0	goes to b
	0.33	0.0	0.0	1.0	0.0	goes to a->a
	0.67	0.0	0.0	0.0	0.0	goes to a->b
	0.0	1.0	0.0	0.0	0.0	goes to b->a

This HMM lacks emission probabilities for nodes a and b, but a more important fact is that these nodes can be eliminated, producing a more compact network. For every pair of transitions (t_ij,t_jk) such that j is an old state and i and k are new states, we introduce a direct transition t_ik and let its probability the product of the probabilities of t_ij and t_jk. This gives the transition probabilities in table

. The initial probabilities are given by $\pi(j) = \sum_i \pi(i) t_{ij}$ where j is an new state and i is an old state. The emission probabilities on the new states are left unchanged.

**Table 10.4:** The Rabiner version of Charniak p64
$\begin{table} Initial probabilities: \begin{displaymath} \begin{array} {ccc} a\r... ... \ 0 & 0.48 & 1 & 0 \ 1 & 0.52 & 0 & 1 \end{array}\end{displaymath}\end{table}$

**Figure 10.5:** Final transformation of figure 10.3
$\begin{figure} \fbox { \mbox{\includegraphics{Figures/rabiner-p64b.ps}} }\end{figure}$

The result of the process is shown in figure 10.5 The HMMs used in the Xerox tagger have the same form as those used by Rabiner, but $\Pi$ is taken to be the probability distribution of a state hallucinated one before the first word of the sentence proper.

Next: Example Up: Converting between the two Previous: Rabiner to Charniak

Chris Brew
8/7/1998