Generative-Adversarial Networks for Low-Resource Language Data Augmentation in Machine Translation

Linda Zeng

arXiv:2409.00071·cs.CL·October 28, 2025

Generative-Adversarial Networks for Low-Resource Language Data Augmentation in Machine Translation

Linda Zeng

PDF

Open Access

TL;DR

This paper explores using GANs to generate additional low-resource language data to improve neural machine translation, demonstrating promising initial results in data augmentation for very limited datasets.

Contribution

It introduces a novel GAN-based data augmentation method specifically designed for low-resource language NMT, addressing the scarcity of training data.

Findings

01

GANs can generate plausible monolingual sentences for low-resource languages

02

The approach shows potential in improving data availability for low-resource NMT

03

Initial results indicate promise for future development of GAN-based augmentation

Abstract

Neural Machine Translation (NMT) systems struggle when translating to and from low-resource languages, which lack large-scale data corpora for models to use for training. As manual data curation is expensive and time-consuming, we propose utilizing a generative-adversarial network (GAN) to augment low-resource language data. When training on a very small amount of language data (under 20,000 sentences) in a simulated low-resource setting, our model shows potential at data augmentation, generating monolingual language data with sentences such as "ask me that healthy lunch im cooking up," and "my grandfather work harder than your grandfather before." Our novel data augmentation approach takes the first step in investigating the capability of GANs in low-resource NMT, and our results suggest that there is promise for future extension of GANs to low-resource NMT.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling