Why Neural Machine Translation Prefers Empty Outputs
Xing Shi, Yijun Xiao, Kevin Knight

TL;DR
This paper explores why neural machine translation systems often produce empty outputs, revealing that label smoothing and the uniform use of end-of-sequence tokens contribute to this issue, and proposes a method to mitigate it.
Contribution
The study identifies the causes of empty translations in NMT and introduces a technique using different EoS tokens to reduce implicit smoothing effects.
Findings
Label smoothing reduces confidence in correct-length translations.
Using different EoS tokens for varying sentence lengths decreases empty outputs.
Explicitly addressing implicit smoothing improves translation quality.
Abstract
We investigate why neural machine translation (NMT) systems assign high probability to empty translations. We find two explanations. First, label smoothing makes correct-length translations less confident, making it easier for the empty translation to finally outscore them. Second, NMT systems use the same, high-frequency EoS word to end all target sentences, regardless of length. This creates an implicit smoothing that increases zero-length translations. Using different EoS types in target sentences of different lengths exposes and eliminates this implicit smoothing.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsLabel Smoothing
