On the Shortcut Learning in Multilingual Neural Machine Translation
Wenxuan Wang, Wenxiang Jiao, Jen-tse Huang, Zhaopeng Tu, Michael R., Lyu

TL;DR
This paper investigates the off-target issue in multilingual neural machine translation, attributing it to shortcut learning of language mappings, and proposes a simple training strategy to improve zero-shot translation without extra costs.
Contribution
It identifies shortcut learning as the cause of off-target translation in MNMT and introduces a data removal method during training to mitigate this issue.
Findings
Removing shortcut-inducing instances improves zero-shot translation accuracy.
Shortcut learning occurs mainly in later training stages and is worsened by multilingual pretraining.
The proposed method enhances MNMT performance across various models and benchmarks.
Abstract
In this study, we revisit the commonly-cited off-target issue in multilingual neural machine translation (MNMT). By carefully designing experiments on different MNMT scenarios and models, we attribute the off-target issue to the overfitting of the shortcuts of (non-centric, centric) language mappings. Specifically, the learned shortcuts biases MNMT to mistakenly translate non-centric languages into the centric language instead of the expected non-centric language for zero-shot translation. Analyses on learning dynamics show that the shortcut learning generally occurs in the later stage of model training, and multilingual pretraining accelerates and aggravates the shortcut learning. Based on these observations, we propose a simple and effective training strategy to eliminate the shortcuts in MNMT models by leveraging the forgetting nature of model training. The only difference from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
