SSA-COMET: Do LLMs Outperform Learned Metrics in Evaluating MT for Under-Resourced African Languages?
Senyu Li, Jiayi Wang, Felermino D. M. A. Ali, Colin Cherry, Daniel Deutsch, Eleftheria Briakou, Rui Sousa-Silva, Henrique Lopes Cardoso, Pontus Stenetorp, David Ifeoluwa Adelani

TL;DR
This paper introduces SSA-MTE, a large-scale human-annotated dataset for African language MT evaluation, and develops SSA-COMET metrics, demonstrating they outperform existing models and are competitive with top LLMs, especially for low-resource languages.
Contribution
The paper presents SSA-MTE dataset and SSA-COMET metrics, advancing evaluation of MT for African languages and benchmarking LLMs in this context.
Findings
SSA-COMET outperforms AfriCOMET in evaluations.
SSA-COMET is competitive with GPT-4o, Claude-3.7, and Gemini 2.5 Pro.
Resources are openly available for future research.
Abstract
Evaluating machine translation (MT) quality for under-resourced African languages remains a significant challenge, as existing metrics often suffer from limited language coverage and poor performance in low-resource settings. While recent efforts, such as AfriCOMET, have addressed some of the issues, they are still constrained by small evaluation sets, a lack of publicly available training data tailored to African languages, and inconsistent performance in extremely low-resource scenarios. In this work, we introduce SSA-MTE, a large-scale human-annotated MT evaluation (MTE) dataset covering 14 African language pairs from the News domain, with over 73,000 sentence-level annotations from a diverse set of MT systems. Based on this data, we develop SSA-COMET and SSA-COMET-QE, improved reference-based and reference-free evaluation metrics. We also benchmark prompting-based approaches using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Translation Studies and Practices
MethodsSparse Evolutionary Training
