Bridging Gaps in Natural Language Processing for Yor\`ub\'a: A Systematic Review of a Decade of Progress and Prospects
Toheeb Aduramomi Jimoh, Tabea De Wille, Nikola S. Nikolov

TL;DR
This systematic review analyzes a decade of NLP research for Yorùbá, highlighting challenges like resource scarcity and linguistic complexity, and emphasizing the need for more annotated data, models, and inclusive techniques to advance NLP for this under-resourced language.
Contribution
The paper provides a comprehensive analysis of NLP studies for Yorùbá from 2014 to 2024, identifying key challenges, resources, and research gaps to guide future efforts.
Findings
Scarcity of annotated corpora and pre-trained models
Linguistic challenges like tonal complexity and diacritic dependency
Growing resources but socio-cultural constraints limit progress
Abstract
Natural Language Processing (NLP) is becoming a dominant subset of artificial intelligence as the need to help machines understand human language looks indispensable. Several NLP applications are ubiquitous, partly due to the myriad of datasets being churned out daily through mediums like social networking sites. However, the growing development has not been evident in most African languages due to the persisting resource limitations, among other issues. Yor\`ub\'a language, a tonal and morphologically rich African language, suffers a similar fate, resulting in limited NLP usage. To encourage further research towards improving this situation, this systematic literature review aims to comprehensively analyse studies addressing NLP development for Yor\`ub\'a, identifying challenges, resources, techniques, and applications. A well-defined search string from a structured protocol was…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Natural Language Processing Techniques · Speech Recognition and Synthesis
