Meenz bleibt Meenz, but Large Language Models Do Not Speak Its Dialect
Minh Duc Bui, Manuel Mager, Peter Herbert Kann, Katharina von der Wense

TL;DR
This paper investigates the capabilities of large language models to generate and understand the Meenzerisch dialect, revealing significant limitations and emphasizing the need for more resources and focused research on German dialects.
Contribution
First NLP study on Meenzerisch dialect, introducing a digital dictionary dataset and evaluating LLMs' ability to generate dialect words and definitions.
Findings
LLMs perform poorly in generating dialect definitions and words, with accuracy below 10%.
Few-shot learning and rule extraction improve results but remain insufficient.
Highlighting the need for more resources and dedicated research on German dialects.
Abstract
Meenzerisch, the dialect spoken in the German city of Mainz, is also the traditional language of the Mainz carnival, a yearly celebration well known throughout Germany. However, Meenzerisch is on the verge of dying out-a fate it shares with many other German dialects. Natural language processing (NLP) has the potential to help with the preservation and revival efforts of languages and dialects. However, so far no NLP research has looked at Meenzerisch. This work presents the first research in the field of NLP that is explicitly focused on the dialect of Mainz. We introduce a digital dictionary-an NLP-ready dataset derived from an existing resource (Schramm, 1966)-to support researchers in modeling and benchmarking the language. It contains 2,351 words in the dialect paired with their meanings described in Standard German. We then use this dataset to answer the following research…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLinguistic Variation and Morphology · Authorship Attribution and Profiling · Natural Language Processing Techniques
