Low-Resourced Machine Translation for Senegalese Wolof Language

Derguene Mbaye; Moussa Diallo; Thierno Ibrahima Diop

arXiv:2305.00606·cs.CL·May 2, 2023·1 cites

Low-Resourced Machine Translation for Senegalese Wolof Language

Derguene Mbaye, Moussa Diallo, Thierno Ibrahima Diop

PDF

Open Access

TL;DR

This paper introduces a new Wolof/French parallel corpus and evaluates RNN-based machine translation models, highlighting the impact of subword units and related language pairs on translation performance for a low-resource African language.

Contribution

It provides one of the first Wolof/French parallel corpora and assesses RNN-based translation models in low-resource settings, demonstrating the benefits of subword modeling.

Findings

01

Subworded data improves translation quality.

02

French-English transfer enhances Wolof translation.

03

Models trained on related language pairs perform better.

Abstract

Natural Language Processing (NLP) research has made great advancements in recent years with major breakthroughs that have established new benchmarks. However, these advances have mainly benefited a certain group of languages commonly referred to as resource-rich such as English and French. Majority of other languages with weaker resources are then left behind which is the case for most African languages including Wolof. In this work, we present a parallel Wolof/French corpus of 123,000 sentences on which we conducted experiments on machine translation models based on Recurrent Neural Networks (RNN) in different data configurations. We noted performance gains with the models trained on subworded data as well as those trained on the French-English language pair compared to those trained on the French-Wolof pair under the same experimental conditions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification