Large Language Models (such as GPT2, GPT3, GPT4, RoBERTa, T5) and Intelligent Chatbots (such as ChatGPT, Bard and Claude) are a very timely topic.
Inhalte:
N-gram language models, neural language modeling, word2vec, RNNs, Transformers, BERT, RLHF, ChatGPT, multilingual alignment, prompting, transfer learning, domain adaptation, linguistic knowledge in large language models
Lernziele:
The participants will first learn the basics of n-gram language models, neural language modeling, RNNs and Transformers. In the second half of the seminar, participants will present an application of a modern large language model, intelligent chatbot or similar system. This class will involve a large amount of reading on both the basics and advanced topics.
Email Address: Put My Last Name Here @cis.uni-muenchen.de
Tuesdays: 16:00 c.t., Oettingenstr. 67 / 165
For a LaTeX template for the Hausarbeit, click here.
If this web page does not seem to be up to date, use the refresh button in your browser.
Date | Topic | Materials |
October 31st | Dan Jurafsky and James H. Martin (2023). Speech and Language Processing (3rd ed. draft), Chapter 3, N-gram Language Models | |
November 7th | Y Bengio, R Ducharme, P Vincent, C Jauvin (2003). A neural probabilistic language model. Journal of Machine Learning Research 3, 1137-1155 | |
November 14th | Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean (2013). Efficient Estimation of Word Representations in Vector Space. ICLR | paper |
November 21st | Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin (2017). Attention Is All You Need. NIPS | paper |
November 28th | Lena Voita. NLP Course: Sequence to Sequence (seq2seq) and Attention. Web Tutorial | webpage |
Also November 28th | Referat Topics |
Presentation and Writeup Kathy Hämmerl Dr Marion Di Marco Dr Viktor Hangya Faeze Ghorbanpour |
December 5th | Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova (2019) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT | paper |
Referatsthemen (name: topic)
Date | Topic | Reference | Materials | Presenter | Hausarbeit Received |
December 12th | InstructGPT (AF) | Long Ouyang, Jeff Wu, et al. (2022). Training language models to follow instructions with human feedback. arXiv. | paper | AF | |
(same as above) | Factuality in LLMs (KH) | Gao, Tianyu et al. (2023). Enabling Large Language Models to Generate Text with Citations. EMNLP | paper | Jana Grimm | yes |
January 9th, 2024 | Decoding Strategies (KH) | Gian Wiher, Clara Meister, and Ryan Cotterell (2022). On Decoding Strategies for Neural Text Generators. Transactions of the Association for Computational Linguistics, 10:997–1012. | paper | Oliver Kraus | yes |
(same as above) | Position of Relevant Information (VH) | Liu et al. (2023). Lost in the Middle: How Language Models Use Long Contexts. Transactions of the Association for Computational Linguistics | paper | Huixin Chen | yes |
January 16th, 2024 | Importance of Data Understanding (FG) | Elazar et al. (2023). What’s In My Big Data?. In arXiv preprint arXiv:2310.20707. | paper | Lea Hirlimann | yes |
(same as above) | Emergent Capabilities of LLMs (VH) | Wei et al. (2022). Emergent Abilities of Large Language Models. Transactions on Machine Learning Research | paper | Shuo Xu | yes |
January 23rd, 2024 | Inequality Between Languages (KH) | Ahia, Orevaoghene et al. (2023). Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models. CoRR, ArXiv abs/2305.13707 | paper | Zhijun Ying | yes |
(same as above) | Data Pruning for LLM Training (FG) | Marion et al. (2023). When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale. In arXiv preprint arXiv:2309.04564. | paper | Kristina Kuznetsova | yes |
January 30th, 2024 | Subword Segmentation (MDM) | Valentin Hofmann, Janet Pierrehumbert, Hinrich Schuetze (2021). Superbizarre Is Not Superb: Derivational Morphology Improves BERT’s Interpretation of Complex Words. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing | paper | Pingjun Hong | yes |