NaijaNLP: A Survey of Nigerian Low-Resource Languages
Isa Inuwa-Dutse

TL;DR
This paper provides the first comprehensive review of NLP research for Nigeria's three major low-resource languages, highlighting resource gaps, challenges, and future directions for advancing language understanding and generation.
Contribution
It offers a detailed assessment of linguistic resources and challenges in Hausa, Yoruba, and Igbo, emphasizing the need for resource development and collaborative efforts.
Findings
Only 25.1% of studies contribute new linguistic resources
Diacritics representation remains under-explored
Significant resource gaps hinder NLP progress
Abstract
With over 500 languages in Nigeria, three languages -- Hausa, Yor\`ub\'a and Igbo -- spoken by over 175 million people, account for about 60% of the spoken languages. However, these languages are categorised as low-resource due to insufficient resources to support tasks in computational linguistics. Several research efforts and initiatives have been presented, however, a coherent understanding of the state of Natural Language Processing (NLP) - from grammatical formalisation to linguistic resources that support complex tasks such as language understanding and generation is lacking. This study presents the first comprehensive review of advancements in low-resource NLP (LR-NLP) research across the three major Nigerian languages (NaijaNLP). We quantitatively assess the available linguistic resources and identify key challenges. Although a growing body of literature addresses various NLP…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
