Developing an English-Efik Corpus and Machine Translation System for Digitization Inclusion
Offiong Bassey Edet, Mbuotidem Sunday Awak, Emmanuel Oyo-Ita, Benjamin Okon Nyong, Ita Etim Bassey

TL;DR
This paper develops an English-Efik machine translation system using a small community-curated corpus, demonstrating that state-of-the-art multilingual models can effectively serve low-resource languages and promote digital inclusion.
Contribution
It introduces a new English-Efik corpus and evaluates fine-tuned multilingual neural models, showing promising translation performance for a previously underrepresented language.
Findings
NLLB-200 outperformed mT5 with BLEU scores of 26.64 and 31.21.
The study demonstrates the feasibility of machine translation for low-resource languages.
Highlights the importance of inclusive data and culturally grounded evaluation.
Abstract
Low-resource languages serve as invaluable repositories of human history, preserving cultural and intellectual diversity. Despite their significance, they remain largely absent from modern natural language processing systems. While progress has been made for widely spoken African languages such as Swahili, Yoruba, and Amharic, smaller indigenous languages like Efik continue to be underrepresented in machine translation research. This study evaluates the effectiveness of state-of-the-art multilingual neural machine translation models for English-Efik translation, leveraging a small-scale, community-curated parallel corpus of 13,865 sentence pairs. We fine-tuned both the mT5 multilingual model and the NLLB200 model on this dataset. NLLB-200 outperformed mT5, achieving BLEU scores of 26.64 for English-Efik and 31.21 for Efik-English, with corresponding chrF scores of 51.04 and 47.92,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Language and cultural evolution · Computational and Text Analysis Methods
