NSL-MT: Linguistically Informed Negative Samples for Efficient Machine Translation in Low-Resource Languages

Mamadou K. Keita; Christopher Homan; Huy Le

arXiv:2511.09537·cs.LG·May 7, 2026

NSL-MT: Linguistically Informed Negative Samples for Efficient Machine Translation in Low-Resource Languages

Mamadou K. Keita, Christopher Homan, Huy Le

PDF

TL;DR

NSL-MT is a novel training approach for low-resource machine translation that uses linguistically informed negative samples to improve performance and data efficiency.

Contribution

It introduces a new negative space learning method that enhances translation quality and data efficiency for underresourced languages.

Findings

01

Achieves 3-12% BLEU improvements on baseline models.

02

Provides 56-89% gains for models with limited initial support.

03

Offers a 5x data efficiency increase, matching larger datasets with fewer examples.

Abstract

We introduce negative space learning machine translation (NSL-MT), a training method for underresourced languages, that augments limited parallel data with synthetically generated violations of the target language's grammar and explicitly penalizes the model when it assigns high probability to these linguistically invalid outputs. NSL-MT delivers improvements across all baselines we tested, including 3-12% BLEU gains for well-performing models and 56-89% gains for models lacking decent initial support. Furthermore, NSL-MT provides a 5x data efficiency multiplier: training with 1,000 examples matches or exceeds normal training with 5,000 examples. NSL-MT thus provides a data-efficient alternative training method for settings where parallel data is limited.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.