NaijaRC: A Multi-choice Reading Comprehension Dataset for Nigerian   Languages

Anuoluwapo Aremu; Jesujoba O. Alabi; Daud Abolade; Nkechinyere F.; Aguobi; Shamsuddeen Hassan Muhammad; David Ifeoluwa Adelani

arXiv:2308.09768·cs.CL·May 21, 2024·2 cites

NaijaRC: A Multi-choice Reading Comprehension Dataset for Nigerian Languages

Anuoluwapo Aremu, Jesujoba O. Alabi, Daud Abolade, Nkechinyere F., Aguobi, Shamsuddeen Hassan Muhammad, David Ifeoluwa Adelani

PDF

Open Access 1 Repo

TL;DR

This paper introduces NaijaRC, a multi-choice reading comprehension dataset for Nigerian languages, and evaluates baseline models and LLM prompting techniques to advance NLP resources for these languages.

Contribution

It creates the first high-school level reading comprehension dataset for Nigerian languages and benchmarks cross-lingual transfer and LLM prompting methods.

Findings

01

Cross-lingual transfer yields moderate accuracy improvements.

02

GPT-4 performs competitively with baseline models.

03

NaijaRC provides a valuable resource for NLP research in Nigerian languages.

Abstract

In this paper, we create NaijaRC: a new multi-choice Reading Comprehension dataset for three native Nigeria languages that is based on high-school reading comprehension examination. We provide baseline results by performing cross-lingual transfer using existing English RACE and Belebele training dataset based on a pre-trained encoder-only model. Additionally, we provide results by prompting large language models (LLMs) like GPT-4.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aremuadeolajr/naijarc
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Label Smoothing · Layer Normalization · Softmax · Dense Connections