Efficiently Upgrading Multilingual Machine Translation Models to Support   More Languages

Simeng Sun; Maha Elbayad; Anna Sun; James Cross

arXiv:2302.03528·cs.CL·February 8, 2023

Efficiently Upgrading Multilingual Machine Translation Models to Support More Languages

Simeng Sun, Maha Elbayad, Anna Sun, James Cross

PDF

Open Access

TL;DR

This paper presents techniques to efficiently upgrade multilingual machine translation models to support more languages, reducing computation and preventing catastrophic forgetting while maintaining or improving performance.

Contribution

It introduces three methods—network initialization, learning rate scaling, and data up-sampling—that enhance learning efficiency and model reuse when expanding language support.

Findings

01

Achieves better performance than baseline with 30% less computation.

02

Recovers the performance of larger models with over 50% less computation.

03

Techniques effectively mitigate catastrophic forgetting.

Abstract

With multilingual machine translation (MMT) models continuing to grow in size and number of supported languages, it is natural to reuse and upgrade existing models to save computation as data becomes available in more languages. However, adding new languages requires updating the vocabulary, which complicates the reuse of embeddings. The question of how to reuse existing models while also making architectural changes to provide capacity for both old and new languages has also not been closely studied. In this work, we introduce three techniques that help speed up effective learning of the new languages and alleviate catastrophic forgetting despite vocabulary and architecture mismatches. Our results show that by (1) carefully initializing the network, (2) applying learning rate scaling, and (3) performing data up-sampling, it is possible to exceed the performance of a same-sized baseline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings