Exploring Diachronic and Diatopic Changes in Dialect Continua: Tasks, Datasets and Challenges
Melis \c{C}elikkol, Lydia K\"orber, Wei Zhao

TL;DR
This paper reviews the intersection of diachronic and diatopic changes in dialect NLP, analyzing tasks, datasets, and challenges across multiple language families to promote inclusive language technology.
Contribution
It systematically unifies diachronic and diatopic dialect research, assessing tasks, datasets, and challenges to advance inclusive NLP for language varieties.
Findings
Assessment of nine dialect NLP tasks across five dialects
Identification of five key open challenges in dialect change research
Critical review of datasets and methodologies for dialect analysis
Abstract
Everlasting contact between language communities leads to constant changes in languages over time, and gives rise to language varieties and dialects. However, the communities speaking non-standard language are often overlooked by non-inclusive NLP technologies. Recently, there has been a surge of interest in studying diatopic and diachronic changes in dialect NLP, but there is currently no research exploring the intersection of both. Our work aims to fill this gap by systematically reviewing diachronic and diatopic papers from a unified perspective. In this work, we critically assess nine tasks and datasets across five dialects from three language families (Slavic, Romance, and Germanic) in both spoken and written modalities. The tasks covered are diverse, including corpus construction, dialect distance estimation, and dialect geolocation prediction, among others. Moreover, we outline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Linguistics and language evolution · Linguistics and Cultural Studies
