Erweiterungsmodul: Machine Translation

Summary

Machine Translation

In the first part, the general problem of machine translation (automatic translation of text from one language to another) will be discussed, as well as the history of research into machine translation. We will then briefly consider older approaches to machine translation (before the current focus on machine learning). Then, some particular challenges for natural language processing that must be solved on the way to general approaches for machine translation will be presented. Finally, we will discuss the important topic of evaluation of machine translation systems.

In the second part, we will look at statistical machine translation (SMT), which became the dominant paradigm in translation from about 2000 to 2015, and is still the core of many industrial systems. The related concepts of translational equivalence (established through word alignment), simple statistical models and search algorithms will be introduced.

In the third and last part of the lecture, we will consider the deep learning approaches used in so-called neural machine translation (NMT). We will briefly introduce the concepts of word embeddings and deep learning before moving on to provide a high-level overview of recurrent neural networks (RNNs) and Long Short-Term Memory (LSTM) approaches to translation, and then follow up with the state-of-the-art Transformer approach, and talk about transfer learning (with applications beyond NMT).

Goals

Theoretical understanding of the challenges of machine translation and the models used to solve them.

Machine Translation Exercises (Übung)

Goals

Practical experience in solving sub-problems of machine translation, as well as familiarity with the data used for training statistical models.

Instructor

Alexander Fraser

Email Address: SubstituteMyLastName@cis.uni-muenchen.de

CIS, LMU Munich

Schedule

Tuesdays, 16 to 18 (c.t.). ONLINE WITH ZOOM. Link will be sent to students listed in LSF.

Wednesdays, 14 to 16 (c.t.). ONLINE WITH ZOOM. Link will be sent to students listed in LSF.

Date Topic Reading (DO AFTER THE MEETING!) Slides Video

April 14th Orientation and Introduction to Machine Translation pdf mp4

April 20th Introduction to Statistical Machine Translation ppt pdf mp4

April 27th Bitext alignment (extracting lexical knowledge from parallel corpora) ppt pdf mp4

Optional: read about Model 1 in Koehn and/or Knight (see below)

April 28th Many-to-many alignments and Phrase-based model ppt pdf mp4

May 4th Log-linear model and Minimum Error Rate Training
ppt pdf mp4

May 5th Decoding pdf mp4

May 11th Exercise 1 Released. Due Monday May 17th at 15:00. exercise1.txt mp4
May 12th Linear Models pptx pdf part1 mp4
part2 mp4

May 18th Exercise 2 Released. Due Monday May 31st at 15:00. exercise2.html mp4
tamchyna_acl_2016_slides.pdf tamchyna_acl_2016_slides.pptx
May 19th Neural Networks (and Word Embeddings) pdf mp4 (skip first 60 seconds)

May 25th Pfingstdienstag (holiday)

May 26th Bilingual Word Embeddings and Unsupervised SMT (Viktor Hangya) pdf mp4

June 1st Training and RNN/LSTMs pdf mp4

June 2nd Exercise 3 Released. Due Monday June 7th at 15:00. exercise3.pdf mp4
June 8th Exercise 4 Released. Due Monday June 14th at 15:00. exercise4.pdf mp4
June 9th Encoder-Decoder and Attention (Jindřich Libovický) pdf mp4

June 15th Exercise 4 review mp4
June 16th Transformer (and Document NMT) pdf mp4

June 22nd Unsupervised NMT (see previous) mp4

June 23rd Transfer Learning for Unsupervised NMT (Alexandra Chronopoulou) pdf mp4

June 29th Exercise 5 Released. Due Monday July 5th at 15:00. exercise5_updated.pdf mp4
June 30th Overcoming Sparsity in NMT (research talk) pdf mp4

July 6th Exercise 6 (pytorch NLP tutorial) released, not collected, recommended to be done on your own during the summer vacation. exercise6.pdf mp4
July 7th Operation Sequence Model and OOV Translation 14_part1_OSM.pdf 14_part2_OOV.pdf mp4
July 14th Review and Dry run for Zoom Exam (you need a working webcam) trial_exam.doc
mp4
July 20th Exam, live on zoom (you need a working webcam) exam_2021.doc

Literature:

Philipp Koehn's book Statistical Machine Translation.

Kevin Knight's tutorial on SMT (particularly look at IBM Model 1)

Philipp Koehn's other book Neural Machine Translation.

Date	Topic	Reading (DO AFTER THE MEETING!)	Slides	Video
April 14th	Orientation and Introduction to Machine Translation		pdf	mp4
April 20th	Introduction to Statistical Machine Translation		ppt pdf	mp4
April 27th	Bitext alignment (extracting lexical knowledge from parallel corpora)		ppt pdf	mp4
		Optional: read about Model 1 in Koehn and/or Knight (see below)
April 28th	Many-to-many alignments and Phrase-based model		ppt pdf	mp4
May 4th	Log-linear model and Minimum Error Rate Training		ppt pdf	mp4
May 5th	Decoding		pdf	mp4
May 11th	Exercise 1 Released. Due Monday May 17th at 15:00.		exercise1.txt	mp4
May 12th	Linear Models		pptx pdf	part1 mp4 part2 mp4
May 18th	Exercise 2 Released. Due Monday May 31st at 15:00.		exercise2.html	mp4 tamchyna_acl_2016_slides.pdf tamchyna_acl_2016_slides.pptx
May 19th	Neural Networks (and Word Embeddings)		pdf	mp4 (skip first 60 seconds)
May 25th	Pfingstdienstag (holiday)
May 26th	Bilingual Word Embeddings and Unsupervised SMT (Viktor Hangya)		pdf	mp4
June 1st	Training and RNN/LSTMs		pdf	mp4
June 2nd	Exercise 3 Released. Due Monday June 7th at 15:00.		exercise3.pdf	mp4
June 8th	Exercise 4 Released. Due Monday June 14th at 15:00.		exercise4.pdf	mp4
June 9th	Encoder-Decoder and Attention (Jindřich Libovický)		pdf	mp4
June 15th	Exercise 4 review			mp4
June 16th	Transformer (and Document NMT)		pdf	mp4
June 22nd	Unsupervised NMT		(see previous)	mp4
June 23rd	Transfer Learning for Unsupervised NMT (Alexandra Chronopoulou)		pdf	mp4
June 29th	Exercise 5 Released. Due Monday July 5th at 15:00.		exercise5_updated.pdf	mp4
June 30th	Overcoming Sparsity in NMT (research talk)		pdf	mp4
July 6th	Exercise 6 (pytorch NLP tutorial) released, not collected, recommended to be done on your own during the summer vacation.		exercise6.pdf	mp4
July 7th	Operation Sequence Model and OOV Translation		14_part1_OSM.pdf 14_part2_OOV.pdf	mp4
July 14th	Review and Dry run for Zoom Exam (you need a working webcam)		trial_exam.doc	mp4
July 20th	Exam, live on zoom (you need a working webcam)		exam_2021.doc