TL;DR
This paper investigates the challenges of long-tailed token distributions in neural machine translation, proposing a new loss function that improves low-frequency word generation.
Contribution
It introduces the Anti-Focal loss, a novel training objective that incorporates search biases, enhancing NMT performance on rare tokens.
Findings
Significant improvements in low-frequency word translation accuracy.
The Anti-Focal loss outperforms standard cross-entropy across multiple datasets.
Enhanced handling of long-tailed phenomena in structured prediction tasks.
Abstract
State-of-the-art Neural Machine Translation (NMT) models struggle with generating low-frequency tokens, tackling which remains a major challenge. The analysis of long-tailed phenomena in the context of structured prediction tasks is further hindered by the added complexities of search during inference. In this work, we quantitatively characterize such long-tailed phenomena at two levels of abstraction, namely, token classification and sequence generation. We propose a new loss function, the Anti-Focal loss, to better adapt model training to the structural dependencies of conditional text generation by incorporating the inductive biases of beam search in the training process. We show the efficacy of the proposed technique on a number of Machine Translation (MT) datasets, demonstrating that it leads to significant gains over cross-entropy across different language pairs, especially on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
