Ngambay-French Neural Machine Translation (sba-Fr)
Sakayo Toadoum Sari, Angela Fan, Lema Logamou Seknewna

TL;DR
This paper introduces the first Ngambay-French translation dataset and fine-tunes models for low-resource NMT, demonstrating the effectiveness of the M2M100 model with high BLEU scores.
Contribution
Created the first Ngambay-French translation dataset and fine-tuned pre-trained models for low-resource NMT in Chad.
Findings
M2M100 outperforms other models in BLEU scores
The dataset enables further research in Ngambay language translation
Synthetic data improves translation performance
Abstract
In Africa, and the world at large, there is an increasing focus on developing Neural Machine Translation (NMT) systems to overcome language barriers. NMT for Low-resource language is particularly compelling as it involves learning with limited labelled data. However, obtaining a well-aligned parallel corpus for low-resource languages can be challenging. The disparity between the technological advancement of a few global languages and the lack of research on NMT for local languages in Chad is striking. End-to-end NMT trials on low-resource Chad languages have not been attempted. Additionally, there is a dearth of online and well-structured data gathering for research in Natural Language Processing, unlike some African languages. However, a guided approach for data gathering can produce bitext data for many Chadian language translation pairs with well-known languages that have ample data.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsFocus
