Counter-Interference Adapter for Multilingual Machine Translation
Yaoming Zhu, Jiangtao Feng, Chengqi Zhao, Mingxuan Wang, Lei Li

TL;DR
This paper introduces CIAT, a modified Transformer model that reduces interference in multilingual machine translation, significantly improving performance across numerous language pairs compared to existing models.
Contribution
The paper presents CIAT, a novel adapter for Transformer models that mitigates interference in multilingual translation with minimal additional parameters.
Findings
Outperforms strong baselines on 64 of 66 language directions
42 language pairs see over 0.5 BLEU improvement
Consistent performance gains across multiple benchmark datasets
Abstract
Developing a unified multilingual model has long been a pursuit for machine translation. However, existing approaches suffer from performance degradation -- a single multilingual model is inferior to separately trained bilingual ones on rich-resource languages. We conjecture that such a phenomenon is due to interference caused by joint training with multiple languages. To accommodate the issue, we propose CIAT, an adapted Transformer model with a small parameter overhead for multilingual machine translation. We evaluate CIAT on multiple benchmark datasets, including IWSLT, OPUS-100, and WMT. Experiments show that CIAT consistently outperforms strong multilingual baselines on 64 of total 66 language directions, 42 of which see above 0.5 BLEU improvement. Our code is available at \url{https://github.com/Yaoming95/CIAT}~.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Softmax · Label Smoothing · Layer Normalization · Residual Connection · Multi-Head Attention · Byte Pair Encoding
