Information Extraction - Lecture (WS 2022-2023)

Summary

Bei der Informationsextraktion (IE) geht es um die automatische Extraktion von Information aus Volltexten. Die Anwendungen erstrecken sich von der Unterstützung von Internet-Suchmaschinen bis hin zum automatischen Aufbau von Fachdatenbanken. Die Methoden reichen von der Analyse natürlicher Sprache über automatische Termerkennung bis zu automatischen Lernverfahren, wobei symbolische, statistische und hybride Methoden zum Einsatz kommen. Komplexe Informationsstrukturen können mit sogenannten Templates (Informationsmustern) repräsentiert werden. In der Veranstaltung werden verschiedene Anwendungen und Methoden für diverse Anwendungsdomänen betrachtet.

Inhalte:

In der Vorlesung wird zunächst der Begriff der Informationsextraktion in Abgrenzung zum Information Retrieval definiert. Dazu gehört auch, die Teilgebiete und Aufgaben der Informationsextraktion (IE) vorzustellen. Dabei wird konkret auf die einzelnen Probleme der IE eingegangen, bevor Ansätze und Verfahren zur Lösung dieser behandelt werden. Die Veranstaltungsteilnehmer werden lernen, wie die Architektur eines generischen IE-Systems aussieht, welche Komponenten es enthält, und auf welchen Ressourcen es aufbaut.

Lernziele:

Ziel ist es, die Probleme bei der automatischen Informationsextraktion aus Dokumenten zu verstehen und die notwendigen Komponenten und Ressourcen kennenzulernen.

Here is a link to the Seminar

Instructor

Alexander Fraser

Email Address: SubstituteMyLastName@cis.uni-muenchen.de

CIS, LMU Munich

Schedule

Wednesdays, 16 to 18 (c.t.), Oettingenstr. 67 / BU 101

If this web page does not seem to be up to date, use the refresh button in your browser.

Date Topic Reading (BEFORE THE NEXT MEETING!) Lecture slides Video

October 19th Introduction to Information Extraction Read Sarawagi: Introduction (pages 1 to 21) pptx pdf ws20 mp4

October 26th History/Related Fields, Sources, Regular Classes Read Sarawagi: Rule-based (Chapter 2) pptx pdf ws20 mp4

November 2nd Introduction to Evaluation, Rule-based NER pptx pdf ws20 mp4

November 9th More evaluation, IE Tasks, Annotation, Intro Classification-based NER Read Sarawagi: Statistical Methods (Chapter 3) pptx pdf ws20 mp4

November 16th Decision Trees pptx pdf ws20 mp4

November 30th Linear Models Read Sarawagi: Statistical Methods (Chapter 3), this time with the math pptx pdf ws20 mp4

December 14th Neural Networks and Word Embeddings pdf
(page 60 corrected) ws20 mp4

January 11th Neural Networks for NER, Viktor Hangya pdf ws20 mp4

January 18th Relation Extraction Read Sarawagi: Relationship Extraction pdf ws20 mp4

Additional Slides, optional (Klausur, Bachelorarbeit, Event and Multimodal Extraction) pdf

January 25th Open IE pdf ws20 mp4

February 1st Review ws20 mp4

February 8th No Class (exam moved to Feb 15th by student request)

February 15th Exam (16:00 c.t., BU 101, as usual). Bring blank paper and your ID! The exam is *closed* book.

Literature:

Sunita Sarawagi. Information Extraction. Foundations and Trends in Databases, 1(3):261–377, 2008. Table of Contents

Date	Topic	Reading (BEFORE THE NEXT MEETING!)	Lecture slides	Video
October 19th	Introduction to Information Extraction	Read Sarawagi: Introduction (pages 1 to 21)	pptx pdf	ws20 mp4
October 26th	History/Related Fields, Sources, Regular Classes	Read Sarawagi: Rule-based (Chapter 2)	pptx pdf	ws20 mp4
November 2nd	Introduction to Evaluation, Rule-based NER		pptx pdf	ws20 mp4
November 9th	More evaluation, IE Tasks, Annotation, Intro Classification-based NER	Read Sarawagi: Statistical Methods (Chapter 3)	pptx pdf	ws20 mp4
November 16th	Decision Trees		pptx pdf	ws20 mp4
November 30th	Linear Models	Read Sarawagi: Statistical Methods (Chapter 3), this time with the math	pptx pdf	ws20 mp4
December 14th	Neural Networks and Word Embeddings		pdf (page 60 corrected)	ws20 mp4
January 11th	Neural Networks for NER, Viktor Hangya		pdf	ws20 mp4
January 18th	Relation Extraction	Read Sarawagi: Relationship Extraction	pdf	ws20 mp4
	Additional Slides, optional (Klausur, Bachelorarbeit, Event and Multimodal Extraction)		pdf
January 25th	Open IE		pdf	ws20 mp4
February 1st	Review			ws20 mp4
February 8th	No Class (exam moved to Feb 15th by student request)
February 15th	Exam (16:00 c.t., BU 101, as usual). Bring blank paper and your ID! The exam is closed book.