Contextualising Levels of Language Resourcedness that affect NLP tasks

C. Maria Keet; Langa Khumalo

arXiv:2309.17035·cs.CL·January 21, 2026

Contextualising Levels of Language Resourcedness that affect NLP tasks

C. Maria Keet, Langa Khumalo

PDF

Open Access

TL;DR

This paper proposes a nuanced framework for classifying languages based on their resource availability for NLP, moving beyond the simple high/low resource dichotomy to improve research planning and resource allocation.

Contribution

It introduces a new matrix-based typology that characterizes languages along a resource spectrum based on contextual societal features, especially focusing on African languages.

Findings

01

Develops a matrix for language resource classification

02

Provides contextual features for each resource level

03

Enhances understanding of language resource distribution

Abstract

Several widely used software applications involve some form of processing of natural language, with tasks ranging from digitising hardcopies and text processing to speech generation. Varied language resources are used to develop software systems to accomplish a wide range of natural language processing (NLP) tasks, such as the ubiquitous spellcheckers and chatbots. Languages are typically characterised as either low (LRL) or high resourced languages (HRL) with African languages having been characterised as resource-scarce languages and English by far the most well-resourced language. But what lies in-between? We argue that the dichotomous typology of LRL and HRL for all languages is problematic. Through a clear understanding of language resources situated in a society, a matrix is developed that characterises languages as Very LRL, LRL, RL, HRL and Very HRL. The characterisation is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSecond Language Learning and Teaching

MethodsFocus