On Long-Tailed Phenomena in Neural Machine Translation

Vikas Raunak; Siddharth Dalmia; Vivek Gupta; Florian Metze

arXiv:2010.04924·cs.CL·October 13, 2020

On Long-Tailed Phenomena in Neural Machine Translation

Vikas Raunak, Siddharth Dalmia, Vivek Gupta, Florian Metze

PDF

1 Repo

TL;DR

This paper investigates the challenges of long-tailed token distributions in neural machine translation, proposing a new loss function that improves low-frequency word generation.

Contribution

It introduces the Anti-Focal loss, a novel training objective that incorporates search biases, enhancing NMT performance on rare tokens.

Findings

01

Significant improvements in low-frequency word translation accuracy.

02

The Anti-Focal loss outperforms standard cross-entropy across multiple datasets.

03

Enhanced handling of long-tailed phenomena in structured prediction tasks.

Abstract

State-of-the-art Neural Machine Translation (NMT) models struggle with generating low-frequency tokens, tackling which remains a major challenge. The analysis of long-tailed phenomena in the context of structured prediction tasks is further hindered by the added complexities of search during inference. In this work, we quantitatively characterize such long-tailed phenomena at two levels of abstraction, namely, token classification and sequence generation. We propose a new loss function, the Anti-Focal loss, to better adapt model training to the structural dependencies of conditional text generation by incorporating the inductive biases of beam search in the training process. We show the efficacy of the proposed technique on a number of Machine Translation (MT) datasets, demonstrating that it leads to significant gains over cross-entropy across different language pairs, especially on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vyraun/long-tailed
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.