Improving Language Transfer Capability of Decoder-only Architecture in Multilingual Neural Machine Translation
Zhi Qu, Yiran Wang, Chenchen Ding, Hideki Tanaka, Masao Utiyama, Taro, Watanabe

TL;DR
This paper enhances decoder-only multilingual neural machine translation by introducing a two-stage decoding process and contrastive learning, significantly improving zero-shot translation performance compared to traditional encoder-decoder models.
Contribution
It proposes a novel two-stage decoding method and contrastive learning approach to boost language transfer in decoder-only NMT architectures, addressing their previous underperformance.
Findings
Achieved up to 3.39 BLEU improvement in zero-shot translation.
Improved zero-shot translation metrics by up to 6.99 chrF++, 3.22 BERTScore, and 4.81 COMET.
Demonstrated competitive performance in supervised translation tasks.
Abstract
Existing multilingual neural machine translation (MNMT) approaches mainly focus on improving models with the encoder-decoder architecture to translate multiple languages. However, decoder-only architecture has been explored less in MNMT due to its underperformance when trained on parallel data solely. In this work, we attribute the issue of the decoder-only architecture to its lack of language transfer capability. Specifically, the decoder-only architecture is insufficient in encoding source tokens with the target language features. We propose dividing the decoding process into two stages so that target tokens are explicitly excluded in the first stage to implicitly boost the transfer capability across languages. Additionally, we impose contrastive learning on translation instructions, resulting in improved performance in zero-shot translation. We conduct experiments on TED-19 and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques
MethodsContrastive Learning · Focus
