Can LLM Agents Identify Spoken Dialects like a Linguist?
Tobias Bystrich, Lukas Hamm, Maria Hassan, Lea Fischbach, Lucie Flek, Akbar Karimi

TL;DR
This paper investigates whether large language models can classify spoken dialects like Swiss German using phonetic transcriptions and linguistic resources, comparing their performance to existing models and human linguists.
Contribution
It introduces a novel approach combining phonetic transcriptions with linguistic features for dialect classification using LLMs, and provides baseline comparisons with humans and other models.
Findings
LLMs improve dialect classification when given linguistic information.
Human linguist baseline shows potential of transcriptions for dialect recognition.
Opportunities exist to enhance automatic transcriptions for better classification.
Abstract
Due to the scarcity of labeled dialectal speech, audio dialect classification is a challenging task for most languages, including Swiss German. In this work, we explore the ability of large language models (LLMs) as agents in understanding the dialects and whether they can show comparable performance to models such as HuBERT in dialect classification. In addition, we provide an LLM baseline and a human linguist one. Our approach uses phonetic transcriptions produced by ASR systems and combines them with linguistic resources such as dialect feature maps, vowel history, and rules. Our findings indicate that, when linguistic information is provided, the LLM predictions improve. The human baseline shows that automatically generated transcriptions can be beneficial for such classifications, but also presents opportunities for improvement.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
