Bridging the Gap: Enhancing LLM Performance for Low-Resource African Languages with New Benchmarks, Fine-Tuning, and Cultural Adjustments
Tuka Alhanai, Adam Kasumovic, Mohammad Ghassemi, Aven Zitzelberger,, Jessica Lundin, Guillaume Chabot-Couture

TL;DR
This paper introduces new benchmarks and methods to improve the performance of Large Language Models on low-resource African languages, addressing disparities and promoting inclusivity in language technology.
Contribution
It creates the first large-scale benchmark dataset for 8 African languages, evaluates current LLM performance, and explores fine-tuning and cultural adjustments to reduce language gaps.
Findings
Fine-tuning improves performance by 5.6% on average.
Cross-lingual transfer yields 2.9% gains.
Cultural adjustments provide a 3.0% performance boost.
Abstract
Large Language Models (LLMs) have shown remarkable performance across various tasks, yet significant disparities remain for non-English languages, and especially native African languages. This paper addresses these disparities by creating approximately 1 million human-translated words of new benchmark data in 8 low-resource African languages, covering a population of over 160 million speakers of: Amharic, Bambara, Igbo, Sepedi (Northern Sotho), Shona, Sesotho (Southern Sotho), Setswana, and Tsonga. Our benchmarks are translations of Winogrande and three sections of MMLU: college medicine, clinical knowledge, and virology. Using the translated benchmarks, we report previously unknown performance gaps between state-of-the-art (SOTA) LLMs in English and African languages. Finally, using results from over 400 fine-tuned models, we explore several methods to reduce the LLM performance gap,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsICT in Developing Communities · Text Readability and Simplification · Second Language Learning and Teaching
