Building a Functional Machine Translation Corpus for Kpelle
Kweku Andoh Yamoah, Jackson Weako, Emmanuel J. Dorley

TL;DR
This paper introduces the first publicly available English-Kpelle machine translation dataset, demonstrating effective fine-tuning of a multilingual model and highlighting its potential for advancing NLP in low-resource African languages.
Contribution
It provides the first English-Kpelle dataset for machine translation and shows how fine-tuning improves translation quality, enabling broader NLP applications for low-resource languages.
Findings
Achieved BLEU scores of up to 30 in Kpelle-to-English translation.
Demonstrated the dataset's utility for various NLP tasks.
Aligned results with benchmarks for other African languages.
Abstract
In this paper, we introduce the first publicly available English-Kpelle dataset for machine translation, comprising over 2000 sentence pairs drawn from everyday communication, religious texts, and educational materials. By fine-tuning Meta's No Language Left Behind(NLLB) model on two versions of the dataset, we achieved BLEU scores of up to 30 in the Kpelle-to-English direction, demonstrating the benefits of data augmentation. Our findings align with NLLB-200 benchmarks on other African languages, underscoring Kpelle's potential for competitive performance despite its low-resource status. Beyond machine translation, this dataset enables broader NLP tasks, including speech recognition and language modelling. We conclude with a roadmap for future dataset expansion, emphasizing orthographic consistency, community-driven validation, and interdisciplinary collaboration to advance inclusive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques
MethodsALIGN
