Afri-MCQA: Multimodal Cultural Question Answering for African Languages

Atnafu Lambebo Tonja; Srija Anand; Emilio Villa-Cueva; Israel Abebe Azime; Jesujoba Oluwadara Alabi; Muhidin A. Mohamed; Debela Desalegn Yadeta; Negasi Haile Abadi; Abigail Oppong; Nnaemeka Casmir Obiefuna; Idris Abdulmumin; Naome A Etori; Eric Peter Wairagala; Kanda Patrick Tshinu; Imanigirimbabazi Emmanuel; Gabofetswe Malema; Alham Fikri Aji; David Ifeoluwa Adelani; Thamar Solorio

arXiv:2601.05699·cs.CL·January 15, 2026

Afri-MCQA: Multimodal Cultural Question Answering for African Languages

Atnafu Lambebo Tonja, Srija Anand, Emilio Villa-Cueva, Israel Abebe Azime, Jesujoba Oluwadara Alabi, Muhidin A. Mohamed, Debela Desalegn Yadeta, Negasi Haile Abadi, Abigail Oppong, Nnaemeka Casmir Obiefuna, Idris Abdulmumin, Naome A Etori, Eric Peter Wairagala

PDF

Open Access 1 Datasets

TL;DR

This paper introduces Afri-MCQA, a comprehensive multilingual cultural question-answering benchmark for African languages, revealing significant performance gaps in current large language models and emphasizing the need for culturally and linguistically tailored AI approaches.

Contribution

The paper presents the first multilingual cultural QA benchmark for African languages, created entirely by native speakers, and evaluates LLMs, highlighting their poor performance and the need for culturally grounded models.

Findings

01

Open-weight LLMs perform poorly on African language QA tasks.

02

Significant performance gaps exist between native languages and English.

03

Speech-first approaches and culturally grounded pretraining are needed.

Abstract

Africa is home to over one-third of the world's languages, yet remains underrepresented in AI research. We introduce Afri-MCQA, the first Multilingual Cultural Question-Answering benchmark covering 7.5k Q&A pairs across 15 African languages from 12 countries. The benchmark offers parallel English-African language Q&A pairs across text and speech modalities and was entirely created by native speakers. Benchmarking large language models (LLMs) on Afri-MCQA shows that open-weight models perform poorly across evaluated cultures, with near-zero accuracy on open-ended VQA when queried in native language or speech. To evaluate linguistic competence, we include control experiments meant to assess this specific aspect separate from cultural knowledge, and we observe significant performance gaps between native languages and English for both text and speech. These findings underscore the need for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Atnafu/Afri-MCQA
dataset· 88 dl
88 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques