Mixed Cross Entropy Loss for Neural Machine Translation

Haoran Li; Wei Lu

arXiv:2106.15880·cs.CL·July 1, 2021·1 cites

Mixed Cross Entropy Loss for Neural Machine Translation

Haoran Li, Wei Lu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces mixed cross entropy loss for neural machine translation, improving training stability and translation quality by relaxing the one-to-one mapping assumption and mitigating exposure bias.

Contribution

It proposes a novel mixed CE loss function that enhances neural machine translation training in teacher forcing and scheduled sampling methods.

Findings

01

Mixed CE outperforms standard CE on multiple datasets.

02

Models trained with mixed CE better handle paraphrased references.

03

Mixed CE reduces exposure bias and improves probability distribution over translations.

Abstract

In neural machine translation, cross entropy (CE) is the standard loss function in two training methods of auto-regressive models, i.e., teacher forcing and scheduled sampling. In this paper, we propose mixed cross entropy loss (mixed CE) as a substitute for CE in both training approaches. In teacher forcing, the model trained with CE regards the translation problem as a one-to-one mapping process, while in mixed CE this process can be relaxed to one-to-many. In scheduled sampling, we show that mixed CE has the potential to encourage the training and testing behaviours to be similar to each other, more effectively mitigating the exposure bias problem. We demonstrate the superiority of mixed CE over CE on several machine translation datasets, WMT'16 Ro-En, WMT'16 Ru-En, and WMT'14 En-De in both teacher forcing and scheduled sampling setups. Furthermore, in WMT'14 En-De, we also find…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haorannlp/mix
pytorchOfficial

Videos

Mixed Cross Entropy Loss for Neural Machine Translation· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning