Regularizing Word Alignment
Thomas Schoenemann
Abstract:
This talk is about improving the conditional models IBM1 and HMM for word alignment by adding prior knowledge in the form of regularity terms. We explore $L_0$ and (weighted) $L_1$ norms to address two common defects: the garbage collection problem and the fact that words are aligned to many more distinct words than desired.
The computational methods employed are quite diverse: for $L_0$ a discrete
optimization approach, derived from maximum approximations, is used. In
contrast, the $L_1$ is optimized by EM with efficient projections on
simplices.
Bio:
Thomas Schoenemann was born and grew up in Germany. He studied
Computer Science at RWTH Aachen, Germany, where he got a diploma in
2005, having written his diploma thesis on the topic of confidence
measures in machine translation in the group of Hermann
Ney. Afterwards he went to the University of Bonn, Germany, to do his
Ph.D. thesis in computer vision in the years 2006-2008. Up to the end of March
he was a postdoc in the vision group at Lund University, Sweden,
where he also resumed his work on translation. Currently he is
looking for a new group, while exploring different fields.
For scheduling information, please see the Stuttgart reading group page.