The Zeno's Paradox of `Low-Resource' Languages

Hellina Hailu Nigatu; Atnafu Lambebo Tonja; Benjamin Rosman; Thamar; Solorio; Monojit Choudhury

arXiv:2410.20817·cs.CL·October 29, 2024

The Zeno's Paradox of `Low-Resource' Languages

Hellina Hailu Nigatu, Atnafu Lambebo Tonja, Benjamin Rosman, Thamar, Solorio, Monojit Choudhury

PDF

Open Access 1 Video

TL;DR

This paper analyzes how NLP research defines and studies low-resource languages, revealing multiple axes of low-resourcedness and advocating for clearer terminology and understanding of these factors.

Contribution

It provides a qualitative analysis of 150 papers to identify axes influencing low-resource language classification and promotes explicit definitions in NLP research.

Findings

01

Multiple axes contribute to low-resourcedness

02

Lack of consensus on low-resource definitions

03

Difficulty in tracking progress for individual languages

Abstract

The disparity in the languages commonly studied in Natural Language Processing (NLP) is typically reflected by referring to languages as low vs high-resourced. However, there is limited consensus on what exactly qualifies as a `low-resource language.' To understand how NLP papers define and study `low resource' languages, we qualitatively analyzed 150 papers from the ACL Anthology and popular speech-processing conferences that mention the keyword `low-resource.' Based on our analysis, we show how several interacting axes contribute to `low-resourcedness' of a language and why that makes it difficult to track progress for each individual language. We hope our work (1) elicits explicit definitions of the terminology when it is used in papers and (2) provides grounding for the different axes to consider when connoting a language as low-resource.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

The Zeno’s Paradox of ‘Low-Resource’ Languages· underline

Taxonomy

TopicsComputability, Logic, AI Algorithms