Efficiently Upgrading Multilingual Machine Translation Models to Support More Languages
Simeng Sun, Maha Elbayad, Anna Sun, James Cross

TL;DR
This paper presents techniques to efficiently upgrade multilingual machine translation models to support more languages, reducing computation and preventing catastrophic forgetting while maintaining or improving performance.
Contribution
It introduces three methods—network initialization, learning rate scaling, and data up-sampling—that enhance learning efficiency and model reuse when expanding language support.
Findings
Achieves better performance than baseline with 30% less computation.
Recovers the performance of larger models with over 50% less computation.
Techniques effectively mitigate catastrophic forgetting.
Abstract
With multilingual machine translation (MMT) models continuing to grow in size and number of supported languages, it is natural to reuse and upgrade existing models to save computation as data becomes available in more languages. However, adding new languages requires updating the vocabulary, which complicates the reuse of embeddings. The question of how to reuse existing models while also making architectural changes to provide capacity for both old and new languages has also not been closely studied. In this work, we introduce three techniques that help speed up effective learning of the new languages and alleviate catastrophic forgetting despite vocabulary and architecture mismatches. Our results show that by (1) carefully initializing the network, (2) applying learning rate scaling, and (3) performing data up-sampling, it is possible to exceed the performance of a same-sized baseline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
