Improving Language Transfer Capability of Decoder-only Architecture in   Multilingual Neural Machine Translation

Zhi Qu; Yiran Wang; Chenchen Ding; Hideki Tanaka; Masao Utiyama; Taro; Watanabe

arXiv:2412.02101·cs.CL·December 4, 2024

Improving Language Transfer Capability of Decoder-only Architecture in Multilingual Neural Machine Translation

Zhi Qu, Yiran Wang, Chenchen Ding, Hideki Tanaka, Masao Utiyama, Taro, Watanabe

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper enhances decoder-only multilingual neural machine translation by introducing a two-stage decoding process and contrastive learning, significantly improving zero-shot translation performance compared to traditional encoder-decoder models.

Contribution

It proposes a novel two-stage decoding method and contrastive learning approach to boost language transfer in decoder-only NMT architectures, addressing their previous underperformance.

Findings

01

Achieved up to 3.39 BLEU improvement in zero-shot translation.

02

Improved zero-shot translation metrics by up to 6.99 chrF++, 3.22 BERTScore, and 4.81 COMET.

03

Demonstrated competitive performance in supervised translation tasks.

Abstract

Existing multilingual neural machine translation (MNMT) approaches mainly focus on improving models with the encoder-decoder architecture to translate multiple languages. However, decoder-only architecture has been explored less in MNMT due to its underperformance when trained on parallel data solely. In this work, we attribute the issue of the decoder-only architecture to its lack of language transfer capability. Specifically, the decoder-only architecture is insufficient in encoding source tokens with the target language features. We propose dividing the decoding process into two stages so that target tokens are explicitly excluded in the first stage to implicitly boost the transfer capability across languages. Additionally, we impose contrastive learning on translation instructions, resulting in improved performance in zero-shot translation. We conduct experiments on TED-19 and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhiqu22/PhasedDecoder
pytorchOfficial

Videos

Improving Language Transfer Capability of Decoder-only Architecture in Multilingual Neural Machine Translation· underline

Taxonomy

TopicsNatural Language Processing Techniques

MethodsContrastive Learning · Focus