Mixed Cross Entropy Loss for Neural Machine Translation
Haoran Li, Wei Lu

TL;DR
This paper introduces mixed cross entropy loss for neural machine translation, improving training stability and translation quality by relaxing the one-to-one mapping assumption and mitigating exposure bias.
Contribution
It proposes a novel mixed CE loss function that enhances neural machine translation training in teacher forcing and scheduled sampling methods.
Findings
Mixed CE outperforms standard CE on multiple datasets.
Models trained with mixed CE better handle paraphrased references.
Mixed CE reduces exposure bias and improves probability distribution over translations.
Abstract
In neural machine translation, cross entropy (CE) is the standard loss function in two training methods of auto-regressive models, i.e., teacher forcing and scheduled sampling. In this paper, we propose mixed cross entropy loss (mixed CE) as a substitute for CE in both training approaches. In teacher forcing, the model trained with CE regards the translation problem as a one-to-one mapping process, while in mixed CE this process can be relaxed to one-to-many. In scheduled sampling, we show that mixed CE has the potential to encourage the training and testing behaviours to be similar to each other, more effectively mitigating the exposure bias problem. We demonstrate the superiority of mixed CE over CE on several machine translation datasets, WMT'16 Ro-En, WMT'16 Ru-En, and WMT'14 En-De in both teacher forcing and scheduled sampling setups. Furthermore, in WMT'14 En-De, we also find…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning
