Building a Functional Machine Translation Corpus for Kpelle

Kweku Andoh Yamoah; Jackson Weako; Emmanuel J. Dorley

arXiv:2505.18905·cs.CL·May 27, 2025

Building a Functional Machine Translation Corpus for Kpelle

Kweku Andoh Yamoah, Jackson Weako, Emmanuel J. Dorley

PDF

Open Access 1 Video

TL;DR

This paper introduces the first publicly available English-Kpelle machine translation dataset, demonstrating effective fine-tuning of a multilingual model and highlighting its potential for advancing NLP in low-resource African languages.

Contribution

It provides the first English-Kpelle dataset for machine translation and shows how fine-tuning improves translation quality, enabling broader NLP applications for low-resource languages.

Findings

01

Achieved BLEU scores of up to 30 in Kpelle-to-English translation.

02

Demonstrated the dataset's utility for various NLP tasks.

03

Aligned results with benchmarks for other African languages.

Abstract

In this paper, we introduce the first publicly available English-Kpelle dataset for machine translation, comprising over 2000 sentence pairs drawn from everyday communication, religious texts, and educational materials. By fine-tuning Meta's No Language Left Behind(NLLB) model on two versions of the dataset, we achieved BLEU scores of up to 30 in the Kpelle-to-English direction, demonstrating the benefits of data augmentation. Our findings align with NLLB-200 benchmarks on other African languages, underscoring Kpelle's potential for competitive performance despite its low-resource status. Beyond machine translation, this dataset enables broader NLP tasks, including speech recognition and language modelling. We conclude with a roadmap for future dataset expansion, emphasizing orthographic consistency, community-driven validation, and interdisciplinary collaboration to advance inclusive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Building a Functional Machine Translation Corpus for Kpelle· underline

Taxonomy

TopicsNatural Language Processing Techniques

MethodsALIGN