The Implicit Length Bias of Label Smoothing on Beam Search Decoding
Bowen Liang, Pidong Wang, Yuan Cao

TL;DR
This paper reveals that label smoothing in neural machine translation introduces a length bias favoring shorter outputs during beam search decoding, and proposes a rectification method to improve translation quality.
Contribution
It uncovers the implicit length bias caused by label smoothing and introduces a simple inference-time correction to mitigate this effect.
Findings
Label smoothing causes a length bias towards shorter translations.
Applying rectification improves BLEU scores across multiple language pairs.
Translation length is upper bounded by a constant in models trained with label smoothing.
Abstract
Label smoothing is ubiquitously applied in Neural Machine Translation (NMT) training. While label smoothing offers a desired regularization effect during model training, in this paper we demonstrate that it nevertheless introduces length biases in the beam search decoding procedure. Our analysis shows that label smoothing implicitly applies a length penalty term to output sequence, causing a bias towards shorter translations. We also show that for a model fully optimized with label smoothing, translation length is implicitly upper bounded by a fixed constant independent of input. We verify our theory by applying a simple rectification function at inference time to restore the unbiased distributions from the label-smoothed model predictions. This rectification method led to consistent quality improvements on WMT English-German, English-French, English-Czech and English-Chinese tasks, up…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Machine Learning and Data Classification · Topic Modeling
MethodsLabel Smoothing
