An Exploration of Data Augmentation Techniques for Improving English to   Tigrinya Translation

Lidia Kidane; Sachin Kumar; Yulia Tsvetkov

arXiv:2103.16789·cs.CL·April 6, 2021·AfricaNLP·5 cites

An Exploration of Data Augmentation Techniques for Improving English to Tigrinya Translation

Lidia Kidane, Sachin Kumar, Yulia Tsvetkov

PDF

Open Access

TL;DR

This paper investigates various back-translation methods to enhance English to Tigrinya translation, demonstrating that pivoting through a related high-resource language significantly improves performance in low-resource scenarios.

Contribution

The study provides a detailed analysis of back-translation techniques for Tigrinya, highlighting the effectiveness of pivoting through related languages in low-resource translation tasks.

Findings

01

Pivot-based back-translation yields the best improvements.

02

Synthetic data significantly boosts translation quality.

03

Low-resource conditions benefit most from related language pivoting.

Abstract

It has been shown that the performance of neural machine translation (NMT) drops starkly in low-resource conditions, often requiring large amounts of auxiliary data to achieve competitive results. An effective method of generating auxiliary data is back-translation of target language sentences. In this work, we present a case study of Tigrinya where we investigate several back-translation methods to generate synthetic source sentences. We find that in low-resource conditions, back-translation by pivoting through a higher-resource language related to the target language proves most effective resulting in substantial improvements over baselines.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification