Parameter Differentiation based Multilingual Neural Machine Translation
Qian Wang, Jiajun Zhang

TL;DR
This paper introduces a dynamic parameter differentiation method for multilingual neural machine translation, enabling the model to adaptively determine which parameters should be language-specific, leading to improved translation performance.
Contribution
It proposes a novel, training-driven approach inspired by cellular differentiation that automatically identifies language-specific parameters based on gradient similarity.
Findings
Significantly outperforms baseline models with various sharing configurations.
Parameter sharing aligns with linguistic similarities among languages.
Dynamic differentiation improves translation quality across multiple languages.
Abstract
Multilingual neural machine translation (MNMT) aims to translate multiple languages with a single model and has been proved successful thanks to effective knowledge transfer among different languages with shared parameters. However, it is still an open question which parameters should be shared and which ones need to be task-specific. Currently, the common practice is to heuristically design or search language-specific modules, which is difficult to find the optimal configuration. In this paper, we propose a novel parameter differentiation based method that allows the model to determine which parameters should be language-specific during training. Inspired by cellular differentiation, each shared parameter in our method can dynamically differentiate into more specialized types. We further define the differentiation criterion as inter-task gradient similarity. Therefore, parameters with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Neural Networks and Applications · Topic Modeling
