Word Sense Disambiguation (WSD) and Machine Translation (MT) are two key problems of natural language processing where the role of the lexicon is critical. While there are many different inventories of word senses for a particular language, it is clear that a minimal set of word senses can be defined by looking at translations into other languages (which are not synonyms).
Content:
The seminar will begin with the basics of Statistical Machine Translation and Word Sense Disambiguation, and then look at attempts to use approaches taken from the WSD literature in MT.
Goals:
The goal of the seminar is to understand the basics of MT, WSD and in particular the important role of the lexicon in both of these problems.
Email Address: SubstituteMyLastName@cis.uni-muenchen.de
Room C003, Tuesdays, 16:00 to 18:00 (c.t.)
Date | Topic | Reading (DO BEFORE THE MEETING!) | Slides |
October 13th | Organizational Meeting, Personal Information, Orientation Test | ||
October 20th | Introduction to Statistical Machine Translation | ppt pdf | |
October 27th | Bitext alignment (extracting lexical knowledge from parallel corpora) | ppt pdf | |
November 3rd | Many-to-many alignments (also, Referat!) | ppt pdf | |
November 10th | Phrase-based model; Log-linear model and Minimum Error Rate Training (two slide sets) | ppt pdf | |
November 17th | Decoding (Guest Lecture from Ales Tamchyna) | ||
November 24th | Advanced Word Alignment, Morphology, Syntax | ppt pdf | |
December 1st | Introduction to Word Sense Disambiguation | Start reading Navigli (see below) | ppt pdf |
December 8th | Introduction to Linear Models | Navigli, Sections 1 and 2 | pptx pdf |
December 15th | 2 Referat presentations (see below) | ||
December 22nd | *Kalahari* computer lab (confirmed, this is near the new lecture halls). Referat, followed by a computer lab. | Navigli, Sections 3 and 5 | tar.gz (See the included file Slides.pdf. Note also that the label used in this classification problem is 0 and 1 (meaning false and true), but wapiti does multiclass classification, so you can use any string as a label) |
Referatsthemen (name: topic)
Date | Topic | Materials | Hausarbeit Received |
December 15th | Neuburg: Literature Supervised WSD | yes | |
December 15th | Krammer: Literature Dictionary-based WSD | yes | |
December 22nd | Andreyeva: Literature Unsupervised WSD | yes | |
January 12th | Höps: Project 6 Moses EN-DE | yes | |
January 12th | Siilivask: Project 2 Cross-lingual substitution | yes | |
January 19th | Handelshauser: Project 1 Supervised WSD | yes | |
January 19th | Moiseeva: Project 4 Wikification | yes | |
January 26th | Ling: Project 7 Google Translate German Compounds | yes | |
January 26th | Conforti: Project WSD for Venetian | yes |
Literature:
Philipp Koehn's book Statistical Machine Translation
Kevin Knight's tutorial on SMT (particularly look at IBM Model 1)
Roberto Navigli's tutorial on WSD (here is a local copy)