Language Tokens: A Frustratingly Simple Approach Improves Zero-Shot Performance of Multilingual Translation
Muhammad ElNokrashy (1), Amr Hendy (1), Mohamed Maher (1), Mohamed, Afify (1), Hany Hassan Awadalla (2) ((1) Microsoft ATL Cairo, (2) Microsoft, Redmond)

TL;DR
This paper introduces a simple token modification technique that significantly enhances zero-shot and direct multilingual translation performance across various datasets and settings.
Contribution
The paper presents a straightforward token-based method to improve multilingual translation, effective in both zero-shot and supervised scenarios, with consistent performance gains.
Findings
Nearly 10 BLEU points gain on in-house datasets
Improves From-English translation by over 4 BLEU points in WMT
Enhances low-resource domain translation by 1.5-1.7 BLEU points
Abstract
This paper proposes a simple yet effective method to improve direct (X-to-Y) translation for both cases: zero-shot and when direct data is available. We modify the input tokens at both the encoder and decoder to include signals for the source and target languages. We show a performance gain when training from scratch, or finetuning a pretrained model with the proposed setup. In the experiments, our method shows nearly 10.0 BLEU points gain on in-house datasets depending on the checkpoint selection criteria. In a WMT evaluation campaign, From-English performance improves by 4.17 and 2.87 BLEU points, in the zero-shot setting, and when direct data is available for training, respectively. While X-to-Y improves by 1.29 BLEU over the zero-shot baseline, and 0.44 over the many-to-many baseline. In the low-resource setting, we see a 1.5~1.7 point improvement when finetuning on X-to-Y domain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Cancer-related molecular mechanisms research
