The success of statistical machine translation systems such as Moses, Language Weaver and Google Translate has shown that it is possible to build high performance machine translation systems with a small amount of effort using statistical learning techniques.
We are organizing a reading group on statistical machine translation (including work on statistical parsing). The intended audience is wide, including students and researchers in the areas of computational linguistics, linguistics, natural language processing, artificial intelligence and machine learning; everyone is invited.
The language of the reading group is English.
After several introductory lectures we will alternate informal presentations of research papers by members of the group. Our initial goal is to reach the point where we are able to read about and discuss new ideas in statistical machine translation research involving the integration of linguistic representations ranging from deep to shallow.
The reading group was organized by Alex Fraser and Helmut Schmid from 2008 to 2010, Alex Fraser is currently organizing it.
Email Address: SubstituteLastName@ims.uni-stuttgart.de
Institute for Natural Language Processing (IMS/IfNLP)
SFB 732 - Incremental Specification in Context
LOCATION: We will have a single meeting on November 2nd in the IMS phonetics lab, 3.11 (top floor, last room on the right, Institut fuer Maschinelle Sprachverarbeitung, Azenbergstrasse 12, Stuttgart).
Try building your own Moses system !
Future and Present
2011
November 2nd, 16:00, 3.11 | Hassan Sajjad: IJCNLP practice talk: Comparing Two Techniques for Learning Transliteration Models Using a Parallel Corpus | |
TBD | Fabienne Braune: Markos Mylonakis and Khalil Sima'an. Learning Hierarchical Translation Structure with Linguistic Annotations. ACL-HLT 2011. | paper |
TBD | TBD: Libin Shen, Jinxi Xu and Ralph Weischedel. A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model. ACL 2008, outstanding paper award. | paper, see also slides |
Past
2011
August 3rd, 10:30, room 3.11 | Thomas Schoenemann Regularizing Word Alignment. For a full abstract, click here. | |
July 13th, 14:30, 12.21 | Nadir Durrani: M. Galley, J. Graehl, K. Knight, D. Marcu, S. DeNeefe, W. Wang, and I. Thayer. Scalable Inference and Training of Context-Rich Syntactic Models. ACL-COLING 2006. | paper Nadir's slides: pdf ppt |
June 15th, 14:30, 12.21 | ACL Practice Talks. Nadir Durrani: A Joint Sequence Translation Model with Integrated Reordering, Andreas Maletti: How to train your multi bottom-up tree transducer | Durrani paper Maletti paper |
June 8th, 14:30, 12.21 | Hassan Sajjad: David Chen and William Dolan. Collecting Highly Parallel Data for Paraphrase Evaluation. ACL 2011. | paper |
June 1st, 14:30, 12.21 | Anita Gojun: Dmitriy Genzel. Automatically Learning Source-side Reordering Rules for Large Scale Machine Translation. COLING 2010. | paper |
May 18th, 14:30, 12.21 | Marion Weller: Beatrice Daille, Emmanuel Morin. Effective Compositional Model for Lexical Alignment. IJCNLP 2008. | paper |
May 11th, 14:30, 12.21 | Daniel Quernheim: Michel Galley, Mark Hopkins, Kevin Knight, Daniel Marcu. What's in a translation rule? NAACL 2004. | paper Daniel Q's slides |
April 20th, 10:30, 12.21 | Alex Fraser: Andreas Zollmann, Ashish Venugopal, Franz Och and Jay Ponte. A Systematic Comparison of Phrase-Based, Hierarchical and Syntax-Augmented Statistical MT. COLING 2008. | paper |
April 13th, 10:30, 12.21 | Anita Gojun: Rule-based and lattice-based approaches for determining the placement of German verbs in English to German SMT | |
April 6th, 10:30, 12.21 | Daniel Quernheim: Hyper-minimisation of weighted finite automata | |
March 30th, 10:30, 12.21 | Nadir Durrani: A Joint Sequence Translation Model with Integrated Reordering | |
March 23rd, 10:30, 12.21 | Fabienne Braune: Spence Green, Michel Galley, Christopher D. Manning. Improved Models of Distortion Cost for Statistical Machine Translation. NAACL 2010. | paper see also slides/data/code |
March 16th, 10:30, 12.21 | Alex Fraser: Morphological Generation of German for Statistical Machine Translation. (Joint work with Marion Weller, Aoife Cahill, Fabienne Cap, in the DFG project Models of Morphosyntax for SMT) | |
March 2nd, 10:30, 12.21 | Andreas Maletti: Tree Transducers in Machine Translation. For a full abstract, click here. |
2010
November 9th, 14:00, 3.11 | Alex Fraser: Introduction to statistical machine translation - Part 4. Log-linear models for SMT (this is a repeat with some improvements, you should be familiar with the phrase-based model described at the beginning of lecture 3, see lecture 3 slides below) | slides |
August 11th, 14:00, 3.11 | Fabienne Fritzinger: Kristina Toutanova, Hisami Suzuki, and Achim Ruopp. Applying Morphology Generation Models to Machine Translation. ACL 2008. | paper poster |
August 4th, 14:00, 3.11 | Fabienne Braune: Steve DeNeefe and Kevin Knight. Synchronous Tree Adjoining Machine Translation. EMNLP 2009. | paper |
July 28th, 14:00, 3.11 | Fabienne Braune: Anders Søgaard and Jonas Kuhn. Empirical lower bounds on alignment error rates in syntax-based machine translation. SSST 2009. | paper |
July 21st, 14:00, 3.11 | Helmut Schmid: Michel Galley and Christopher D. Manning. Accurate Non-Hierarchical Phrase-Based Translation. NAACL 2010. | paper |
July 7th, 14:00, 3.11 | Nadir Durrani, Fabienne Fritzinger: ACL practice talks | |
June 30th, 14:00, 3.11 | Alex Fraser: Introduction to statistical machine translation - Part 5. Advanced topics in SMT. Discriminative bitext alignment, morphological processing, syntax | slides |
June 23rd, 14:00, 3.11 | Nadir Durrani: Michel Galley, Christopher Manning. A Simple and Effective Hierarchical Phrase Reordering Model. EMNLP 2008. | paper |
June 16th, 14:00, 3.11 | Fabienne Fritzinger: Philipp Koehn, Franz Josef Och, Daniel Marcu. Statistical Phrase-Based Translation. HLT-NAACL 2003. | paper |
June 9th, 14:00, 3.11 | Patrick Leucht: Robert C. Moore. Fast and Accurate Sentence Alignment of Bilingual Corpora. AMTA 2002. | paper |
June 2nd, 14:00, 3.11 | Fabienne Braune: Michael Collins and Philipp Koehn and Ivona Kucerova. Clause Restructuring for Statistical Machine Translation. ACL 2005. | paper |
May 19th, 14:00, 3.11 | Alex Fraser: Introduction to statistical machine translation - Part 4. Log-linear models for SMT | slides |
May 12th, 14:00, 3.11 | Alex Fraser: Introduction to statistical machine translation - Part 3. Decoding (automatically translating a text given an already learned model) | slides |
May 5th, 14:00, 3.11 | Alex Fraser: Introduction to statistical machine translation - Part 2. Bitext alignment (extracting lexical knowledge from parallel corpora) | slides Reading: Kevin Knight's SMT Tutorial (concentrate on Model 1) If you have time: Implement Model 1! |
April 28th, 14:00, 3.11 (IMS Phonetik Labor) | Alex Fraser: Introduction to statistical machine translation - Part 1. Introduction, basics of statistical machine translation (SMT), evaluation of MT | slides |
2009
Wed July 22nd, 15:45, 3.11 (IMS Phonetik Labor) | Alex Fraser: David Chiang. A hierarchical phrase-based model for statistical machine translation. ACL 2005 (best paper) | paper |
Wed July 8th, 15:45, 3.11 (IMS Phonetik Labor) | Alex Fraser: Christoph Tillmann. A Unigram Orientation Model for Statistical Machine Translation. HLT-NAACL 2004 short paper. | paper |
June 4th, 10:30, Office Hans Kamp | Fabienne Braune: Dekai Wu. A polynomial-time algorithm for statistical machine translation. ACL 1996. | paper (ps) (pdf) |
May 28th, 9:45-11:15, 12.21 | Im Rahmen des Hauptseminars Maschinelle Übersetzung I (Heid), spricht PD Dr. Kurt Eberle (Heidelberg/Stuttgart): "Aktuelle Architekturfragen in der Maschinellen Übersetzung: semantischer Transfer und Integration statistischer Information in 'translate'" | |
May 14th, 10:30 | Alex Balabanov: Kenji Yamada and Kevin Knight. A syntax-based statistical translation model. ACL 2001. | paper |
May 7th, 10:30 | Hassan Sajjad: Yaser Al-Onaizan and Kevin Knight. Translating Named Entities Using Monolingual and Bilingual Resources. ACL 2002. | paper |
April 30th, 10:30 | Hassan Sajjad, Alex Fraser: EACL 2009 report (interesting papers), organizational meeting | |
March 26th, 10:30 | Aoife Cahill, Alex Fraser, Hassan Sajjad: Practice talks for EACL | Papers: Cahill Fraser1 Fraser2 Sajjad |
March 19th, 10:30 | Helmut Schmid: Liang Huang. Forest Reranking: Discriminative Parsing with Non-Local Features. ACL 2008 (1 of 2 outstanding paper awards). | paper |
March 5th, 10:30 |
Alex Fraser: Chris Quirk, Arul Menezes, Colin Cherry. Dependency Treelet Translation: Syntactically Informed Phrasal SMT. ACL 2005. Part II: decoding, experiments, discussion. | |
Feb 26th, 10:30 |
Our first paper on a non-preprocessing approach to syntactic SMT! Alex Fraser: Chris Quirk, Arul Menezes, Colin Cherry. Dependency Treelet Translation: Syntactically Informed Phrasal SMT. ACL 2005. Part I: model and training. | paper |
Feb 19th, 10:30 |
Two papers on preprocessing approaches for coping with composita and rich inflection: Fabienne Fritzinger: Empirical Methods for Compound Splitting. Philipp Koehn and Kevin Knight. EACL 2003. Alex Fraser: Improving Statistical MT Through Morphological Analysis. Sharon Goldwater and David McClosky. EMNLP 2005. | composita paper inflection paper |
Feb 12th, 10:30 | Hassan Sajjad: Michael Collins and Philipp Koehn and Ivona Kucerova. Clause Restructuring for Statistical Machine Translation. ACL 2005. | paper |
Feb 5th, 10:30 | Alex Fraser: Franz Josef Och. Minimum Error Rate Training for Statistical Machine Translation. ACL 2003. | paper |
Jan 22nd, 10:30 | Amit Dubey: Franz Josef Och, Hermann Ney. Discriminative Training and Maximum Entropy Models for Statistical Machine Translation. ACL 2002 (best paper). | paper |
2008
Dec 18th, 10:30 | Amit Dubey: Hoifung Poon and Pedro Domingos. EMNLP 2008. Joint Unsupervised Coreference Resolution with Markov Logic. | paper |
Dec 11th, 10:30 | Amit Dubey: Richardson and Domingos. Machine Learning, 62, 107-136, 2006. Markov Logic Networks. | paper |
Dec 4th, 10:30, IMS Mitarbeiter Zimmer | Amit Dubey: Agirre, Baldwin and Martinez. ACL 2008. Improving Parsing and PP Attachment Performance with Sense Information Discriminative Reranking for Natural Language Parsing. | paper |
Nov 27th, 10:30, IMS Mitarbeiter Zimmer | Aoife Cahill: Michael Collins. Discriminative Reranking for Natural Language Parsing. ICML 2000. | paper |
Nov 20th, 10:30, IMS Mitarbeiter Zimmer | Alex Balabanov: Michael Collins. Three Generative, Lexicalised Models for Statistical Parsing. ACL/EACL 1997. You might also be interested in the slides for this paper or the longer Computational Linguistics journal paper (see Michael Collins' homepage) | paper |
Nov 13th, 10:30, IMS Mitarbeiter Zimmer | Nadir Durrani: Statistical Phrase-Based Translation (HLT-NAACL 2003). Philipp Koehn, Franz Josef Och, Daniel Marcu | Statistical Phrase-Based Translation |
Nov 6th, 10:30, IMS Mitarbeiter Zimmer | Alex Fraser: BLEU: a Method for Automatic Evaluation of Machine Translation (ACL 2002). Kishore Papineni, Salim Roukos, Todd Ward, Wei-Jing Zhu | BLEU paper |
October 23rd, 10:30, 3.11 (IMS Phonetik Labor) | Christian Scheible: Introduction to Language Modeling. For slides and reference list, please click here. | Chen and Goodman LM tutorial (focus on interpolation and Kneser/Ney smoothing) |
October 9th, 10:30, 12.21 | Martin Forst: Grammatical Machine Translation II. Martin Forst will discuss recent work on hybrid MT using LFG. For a full abstract, click here. | |
October 2nd, 10:30, 12.21 | Helmut Schmid: PCFG parsing algorithms continued | No required reading. |
September 25th, 10:00, 12.21 | Helmut Schmid: PCFG parsing algorithms | No required reading. |
August 14th, 10:00, 12.21 (IMS lecture hall) | Helmut Schmid: Introduction to CFG parsing algorithms | No required reading. |
August 7th, 10:00, 12.21 (IMS lecture hall) | Helmut Schmid: Introduction to HMM tagging | No required reading. Manning and Schuetze HMM Chapter recommended. |
July 31st, 10:00, 12.21 (IMS lecture hall) | Alex Fraser: Introduction to statistical machine translation - Part 3, phrase-based modeling and decoding | no required reading |
July 21st to July 25th | EMA Summer School, website is here. First two SMT lectures will be repeated on Tuesday, along with a practice assignment (implementing IBM Model 1). The lecture from the next reading group meeting (phrase-based modeling and decoding) will be on Wed at 14:00, followed by a practice assignment on decoding. Thursday morning's lecture will consist of a discussion of the assignments and a brief overview of some more advanced topics. | |
July 17th, 10:00, 3.11 (IMS Phonetik Labor) | Alex Fraser: Introduction to statistical machine translation - Part 2. Bitext alignment (extracting lexical knowledge from parallel corpora) | Kevin Knight's SMT Tutorial |
July 10th, 10:00, 3.11 (IMS Phonetik Labor) | Alex Fraser: Introduction to statistical machine translation - Part 1. I will define the MT problem and talk about evaluation. I will also discuss parallel corpora and sentence alignment and give a brief overview of statistical machine translation (SMT). Kevin Knight's tutorial is recommended, but not necessary until next week. | Kevin Knight's SMT Tutorial |