IrokoBench: A New Benchmark for African Languages in the Age of Large   Language Models

David Ifeoluwa Adelani; Jessica Ojo; Israel Abebe Azime; Jian Yun; Zhuang; Jesujoba O. Alabi; Xuanli He; Millicent Ochieng; Sara Hooker; Andiswa; Bukula; En-Shiun Annie Lee; Chiamaka Chukwuneke; Happy Buzaaba; Blessing; Sibanda; Godson Kalipe; Jonathan Mukiibi; Salomon Kabongo; Foutse Yuehgoh,; Mmasibidi Setaka; Lolwethu Ndolela; Nkiruka Odu; Rooweither Mabuya,; Shamsuddeen Hassan Muhammad; Salomey Osei; Sokhar Samb; Tadesse Kebede Guge,; Tombekai Vangoni Sherman; Pontus Stenetorp

arXiv:2406.03368·cs.CL·January 24, 2025·3 cites

IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models

David Ifeoluwa Adelani, Jessica Ojo, Israel Abebe Azime, Jian Yun, Zhuang, Jesujoba O. Alabi, Xuanli He, Millicent Ochieng, Sara Hooker, Andiswa, Bukula, En-Shiun Annie Lee, Chiamaka Chukwuneke, Happy Buzaaba, Blessing, Sibanda, Godson Kalipe, Jonathan Mukiibi, Salomon Kabongo

PDF

Open Access 3 Models 1 Video

TL;DR

IrokoBench introduces a comprehensive benchmark dataset for 17 low-resource African languages, highlighting significant performance gaps in LLMs and emphasizing the need for more development tailored to these languages.

Contribution

The paper presents IrokoBench, a new multilingual benchmark for African languages, and evaluates LLMs, revealing critical gaps and the impact of translation strategies.

Findings

01

Large performance gap between high-resource and African languages.

02

Proprietary models outperform open models significantly.

03

Translation to English improves performance for some models.

Abstract

Despite the widespread adoption of Large language models (LLMs), their remarkable capabilities remain limited to a few high-resource languages. Additionally, many low-resource languages (\eg African languages) are often evaluated only on basic text classification tasks due to the lack of appropriate or comprehensive benchmarks outside of high-resource languages. In this paper, we introduce IrokoBench -- a human-translated benchmark dataset for 17 typologically-diverse low-resource African languages covering three tasks: natural language inference~(AfriXNLI), mathematical reasoning~(AfriMGSM), and multi-choice knowledge-based question answering~(AfriMMLU). We use IrokoBench to evaluate zero-shot, few-shot, and translate-test settings~(where test sets are translated into English) across 10 open and six proprietary LLMs. Our evaluation reveals a significant performance gap between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models· underline

Taxonomy

TopicsNatural Language Processing Techniques · Language and cultural evolution · Multilingual Education and Policy

MethodsSparse Evolutionary Training · LLaMA