Toucan: Many-to-Many Translation for 150 African Language Pairs
AbdelRahim Elmadany, Ife Adebara, Muhammad Abdul-Mageed

TL;DR
This paper introduces Toucan, a many-to-many translation model supporting 156 African language pairs, along with new resources and benchmarks to improve NLP translation capabilities for low-resource African languages.
Contribution
The paper presents Toucan, a novel Afrocentric translation model for numerous African languages, and develops AfroLingu-MT benchmark and spBLEU-1K evaluation metric.
Findings
Toucan outperforms existing models on African language translation tasks.
Development of AfroLingu-MT benchmark for African languages.
Introduction of spBLEU-1K metric for multilingual translation evaluation.
Abstract
We address a notable gap in Natural Language Processing (NLP) by introducing a collection of resources designed to improve Machine Translation (MT) for low-resource languages, with a specific focus on African languages. First, we introduce two language models (LMs), Cheetah-1.2B and Cheetah-3.7B, with 1.2 billion and 3.7 billion parameters respectively. Next, we finetune the aforementioned models to create toucan, an Afrocentric machine translation model designed to support 156 African language pairs. To evaluate Toucan, we carefully develop an extensive machine translation benchmark, dubbed AfroLingu-MT, tailored for evaluating machine translation. Toucan significantly outperforms other models, showcasing its remarkable performance on MT for African languages. Finally, we train a new model, spBLEU-1K, to enhance translation evaluation metrics, covering 1K languages, including 614…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLinguistic and Sociocultural Studies · Multilingual Education and Policy · Language, Linguistics, Cultural Analysis
MethodsFocus
