Can Multilinguality benefit Non-autoregressive Machine Translation?
Sweta Agrawal, Julia Kreutzer, Colin Cherry

TL;DR
This paper investigates whether multilingual training can enhance non-autoregressive machine translation models, analyzing transfer effects, training data impacts, and performance scaling laws.
Contribution
It provides the first comprehensive empirical study of multilingual NAR models, exploring transfer effects, data strategies, and performance scaling.
Findings
Multilingual NAR models benefit from positive transfer between related languages.
Capacity constraints can lead to negative transfer effects.
Scaling NAR models improves performance relative to AR models.
Abstract
Non-autoregressive (NAR) machine translation has recently achieved significant improvements, and now outperforms autoregressive (AR) models on some benchmarks, providing an efficient alternative to AR inference. However, while AR translation is often implemented using multilingual models that benefit from transfer between languages and from improved serving efficiency, multilingual NAR models remain relatively unexplored. Taking Connectionist Temporal Classification (CTC) as an example NAR model and Imputer as a semi-NAR model, we present a comprehensive empirical study of multilingual NAR. We test its capabilities with respect to positive transfer between related languages and negative transfer under capacity constraints. As NAR models require distilled training sets, we carefully study the impact of bilingual versus multilingual teachers. Finally, we fit a scaling law for multilingual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
