Natural Language Processing for Dialects of a Language: A Survey
Aditya Joshi, Raj Dabre, Diptesh Kanojia, Zhuang Li, Haolan Zhan,, Gholamreza Haffari, Doris Dippold

TL;DR
This survey reviews NLP research on dialects, highlighting datasets, approaches, and tasks across multiple languages, emphasizing the need for equitable language technologies and improved benchmarks.
Contribution
It provides a comprehensive overview of NLP methods for dialects, covering datasets, classical and deep learning approaches, and diverse tasks across languages.
Findings
NLP for dialects extends beyond classification to various NLU and NLG tasks.
Deep learning approaches, including pre-trained models, are increasingly used for dialect processing.
Research on dialects highlights challenges in creating equitable language technologies.
Abstract
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets. This survey delves into an important attribute of these datasets: the dialect of a language. Motivated by the performance degradation of NLP models for dialectal datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches. We describe a wide range of NLP tasks in terms of two categories: natural language understanding (NLU) (for tasks such as dialect classification, sentiment analysis, parsing, and NLU benchmarks) and natural language generation (NLG) (for summarisation, machine translation, and dialogue systems). The survey is also broad in its coverage of languages which include English, Arabic, German, among others. We observe that past…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
