Exploring Intrinsic Language-specific Subspaces in Fine-tuning Multilingual Neural Machine Translation
Zhe Cao, Zhi Qu, Hidetaka Kamigaito, Taro Watanabe

TL;DR
This paper reveals that multilingual neural machine translation fine-tuning occurs in language-specific subspaces and introduces lightweight methods that outperform full fine-tuning while significantly reducing parameters.
Contribution
The authors propose language-specific LoRA and architecture learning techniques with pruning to identify minimal intrinsic subspaces for efficient multilingual fine-tuning.
Findings
Outperforms full fine-tuning by up to 2.25 spBLEU scores.
Reduces trainable parameters to 0.4% for high/medium-resource languages.
Reduces trainable parameters to 1.6% for low-resource languages.
Abstract
Multilingual neural machine translation models support fine-tuning hundreds of languages simultaneously. However, fine-tuning on full parameters solely is inefficient potentially leading to negative interactions among languages. In this work, we demonstrate that the fine-tuning for a language occurs in its intrinsic language-specific subspace with a tiny fraction of entire parameters. Thus, we propose language-specific LoRA to isolate intrinsic language-specific subspaces. Furthermore, we propose architecture learning techniques and introduce a gradual pruning schedule during fine-tuning to exhaustively explore the optimal setting and the minimal intrinsic subspaces for each language, resulting in a lightweight yet effective fine-tuning procedure. The experimental results on a 12-language subset and a 30-language subset of FLORES-101 show that our methods not only outperform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques
MethodsPruning
