Why Neural Machine Translation Prefers Empty Outputs

Xing Shi; Yijun Xiao; Kevin Knight

arXiv:2012.13454·cs.CL·December 29, 2020·5 cites

Why Neural Machine Translation Prefers Empty Outputs

Xing Shi, Yijun Xiao, Kevin Knight

PDF

Open Access

TL;DR

This paper explores why neural machine translation systems often produce empty outputs, revealing that label smoothing and the uniform use of end-of-sequence tokens contribute to this issue, and proposes a method to mitigate it.

Contribution

The study identifies the causes of empty translations in NMT and introduces a technique using different EoS tokens to reduce implicit smoothing effects.

Findings

01

Label smoothing reduces confidence in correct-length translations.

02

Using different EoS tokens for varying sentence lengths decreases empty outputs.

03

Explicitly addressing implicit smoothing improves translation quality.

Abstract

We investigate why neural machine translation (NMT) systems assign high probability to empty translations. We find two explanations. First, label smoothing makes correct-length translations less confident, making it easier for the empty translation to finally outscore them. Second, NMT systems use the same, high-frequency EoS word to end all target sentences, regardless of length. This creates an implicit smoothing that increases zero-length translations. Using different EoS types in target sentences of different lengths exposes and eliminates this implicit smoothing.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification

MethodsLabel Smoothing