University of Cape Town's WMT22 System: Multilingual Machine Translation   for Southern African Languages

Khalid N. Elmadani; Francois Meyer; Jan Buys

arXiv:2210.11757·cs.CL·October 24, 2022

University of Cape Town's WMT22 System: Multilingual Machine Translation for Southern African Languages

Khalid N. Elmadani, Francois Meyer, Jan Buys

PDF

Open Access

TL;DR

The paper presents a multilingual machine translation system for Southern African languages, employing techniques like back-translation and synthetic data to improve translation quality in low-resource settings.

Contribution

It introduces a single multilingual model for African languages using low-resource techniques, demonstrating effective translation with limited data.

Findings

01

Effective translation for low-resource language pairs

02

Techniques like back-translation improve performance

03

Multilingual model handles multiple language directions

Abstract

The paper describes the University of Cape Town's submission to the constrained track of the WMT22 Shared Task: Large-Scale Machine Translation Evaluation for African Languages. Our system is a single multilingual translation model that translates between English and 8 South / South East African Languages, as well as between specific pairs of the African languages. We used several techniques suited for low-resource machine translation (MT), including overlap BPE, back-translation, synthetic training data generation, and adding more translation directions during training. Our results show the value of these techniques, especially for directions where very little or no bilingual training data is available.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsByte Pair Encoding