Low-Resourced Machine Translation for Senegalese Wolof Language
Derguene Mbaye, Moussa Diallo, Thierno Ibrahima Diop

TL;DR
This paper introduces a new Wolof/French parallel corpus and evaluates RNN-based machine translation models, highlighting the impact of subword units and related language pairs on translation performance for a low-resource African language.
Contribution
It provides one of the first Wolof/French parallel corpora and assesses RNN-based translation models in low-resource settings, demonstrating the benefits of subword modeling.
Findings
Subworded data improves translation quality.
French-English transfer enhances Wolof translation.
Models trained on related language pairs perform better.
Abstract
Natural Language Processing (NLP) research has made great advancements in recent years with major breakthroughs that have established new benchmarks. However, these advances have mainly benefited a certain group of languages commonly referred to as resource-rich such as English and French. Majority of other languages with weaker resources are then left behind which is the case for most African languages including Wolof. In this work, we present a parallel Wolof/French corpus of 123,000 sentences on which we conducted experiments on machine translation models based on Recurrent Neural Networks (RNN) in different data configurations. We noted performance gains with the models trained on subworded data as well as those trained on the French-English language pair compared to those trained on the French-Wolof pair under the same experimental conditions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
