NaijaRC: A Multi-choice Reading Comprehension Dataset for Nigerian Languages
Anuoluwapo Aremu, Jesujoba O. Alabi, Daud Abolade, Nkechinyere F., Aguobi, Shamsuddeen Hassan Muhammad, David Ifeoluwa Adelani

TL;DR
This paper introduces NaijaRC, a multi-choice reading comprehension dataset for Nigerian languages, and evaluates baseline models and LLM prompting techniques to advance NLP resources for these languages.
Contribution
It creates the first high-school level reading comprehension dataset for Nigerian languages and benchmarks cross-lingual transfer and LLM prompting methods.
Findings
Cross-lingual transfer yields moderate accuracy improvements.
GPT-4 performs competitively with baseline models.
NaijaRC provides a valuable resource for NLP research in Nigerian languages.
Abstract
In this paper, we create NaijaRC: a new multi-choice Reading Comprehension dataset for three native Nigeria languages that is based on high-school reading comprehension examination. We provide baseline results by performing cross-lingual transfer using existing English RACE and Belebele training dataset based on a pre-trained encoder-only model. Additionally, we provide results by prompting large language models (LLMs) like GPT-4.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Label Smoothing · Layer Normalization · Softmax · Dense Connections
