WP 1: Lexicon, Syntax, Semantics: SMT and NMT

Summary

Statistical Machine Translation (SMT) was the dominant approach used for online translation until 2015. Neural Machine Translation (NMT) is the new dominant approach.

Neural Machine Translation (NMT) is a new paradigm in data-driven machine translation. Previous generation Statistical Machine Translation (SMT) systems are built using a collection of heuristic models, typically combined in a log-linear model with a small number of parameters. In Neural Machine Translation, the entire translation process is posed as an end-to-end supervised classification problem, where the training data is pairs of sentences. While in SMT systems, word-alignment is carried out, and then fixed, and then various sub-models are estimated from the word-aligned data, this is not the case in NMT. In NMT, fixed word-alignments are not used, and instead the full sequence to sequence task is handled in one model.

Content:

The seminar will begin with the basics of Statistical Machine Translation and then briefly introduce Deep Learning before covering the basics of Neural Machine Translation.

Goals:

The goal of the seminar is to understand the basics of SMT and NMT. The varying role of the lexicon (and representations of the lexicon) in these approaches is a critical aspect which will be a focus of study.

Instructor

Alexander Fraser

Email Address: SubstituteMyLastName@cis.uni-muenchen.de

CIS, LMU Munich


Schedule


Room U139, Tuesdays, 16:00 to 18:00 (c.t.)


Date Topic Reading (DO BEFORE THE MEETING!) Slides
October 18th Introduction to Statistical Machine Translation ppt pdf
October 25th Bitext alignment (extracting lexical knowledge from parallel corpora) ppt pdf
November 8th Many-to-many alignments and Phrase-based model ppt pdf
November 15th Log-linear model and Minimum Error Rate Training
Referat
ppt pdf
Fraser Braune/Huck
November 22nd Decoding (Guest Lecture from Tsuyoshi Okita)     pdf
November 29th Introduction to Linear Models (SLIDES UPDATED!) pptx pdf
December 6th Neural Networks (and Word Embeddings), Fabienne Braune     pdf
December 13th Recurrent Neural Networks, Tsuyoshi Okita     pdf
December 20th SMT: Advanced Word Alignment, Morphology, Syntax ppt pdf
January 24th Neural Machine Translation, Matthias Huck     pdf



Referatsthemen (name: topic)


Date Topic Materials Hausarbeit Received
January 10th Palchik: Word-Sense-Disambiguation and WSD for SMT yes
January 10th Deck: Computer-Aided Translation yes
January 17th Bilan: Cross-Lingual Lexical Substitution yes
January 17th Sedinkina: Wikification of Ambiguous Entities yes
January 24th SEE ABOVE
January 31st Poerner: System Combination yes
January 31st Krachenfels: Neural Parsing with Gated Recursive Convolutional Networks yes


Literature:

Philipp Koehn's book Statistical Machine Translation

Kevin Knight's tutorial on SMT (particularly look at IBM Model 1)