Connecting Ideas in 'Lower-Resource' Scenarios: NLP for National Varieties, Creoles and Other Low-resource Scenarios
Aditya Joshi, Diptesh Kanojia, Heather Lent, Hour Kaing, Haiyue Song

TL;DR
This paper discusses the challenges and approaches in NLP for low-resource languages, dialects, and creoles, emphasizing the need for collaboration and innovative methods to overcome data scarcity.
Contribution
It provides an overview of common challenges and themes in NLP for lower-resource scenarios, connecting past ideas to current research to foster collaboration.
Findings
Identifies key challenges in processing low-resource languages.
Highlights approaches to overcome data scarcity.
Encourages cross-disciplinary collaboration.
Abstract
Despite excellent results on benchmarks over a small subset of languages, large language models struggle to process text from languages situated in `lower-resource' scenarios such as dialects/sociolects (national or social varieties of a language), Creoles (languages arising from linguistic contact between multiple languages) and other low-resource languages. This introductory tutorial will identify common challenges, approaches, and themes in natural language processing (NLP) research for confronting and overcoming the obstacles inherent to data-poor contexts. By connecting past ideas to the present field, this tutorial aims to ignite collaboration and cross-pollination between researchers working in these scenarios. Our notion of `lower-resource' broadly denotes the outstanding lack of data required for model training - and may be applied to scenarios apart from the three covered in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultilingual Education and Policy · Translation Studies and Practices · Second Language Learning and Teaching
