Learning Language-Specific Layers for Multilingual Machine Translation
Telmo Pessoa Pires, Robin M. Schmidt, Yi-Hsiu Liao, Stephan Peitz

TL;DR
This paper introduces Language-Specific Transformer Layers (LSLs) for multilingual machine translation, enhancing capacity without increasing computation, leading to significant translation quality improvements across multiple languages.
Contribution
The paper proposes a novel method of adding language-specific layers within Transformer models, optimized via neural architecture search, to improve multilingual translation performance.
Findings
Achieved 1.3 chrF and 1.5 spBLEU improvements with separate decoder architecture.
Achieved 1.9 chrF and 2.2 spBLEU improvements with shared decoder architecture.
Demonstrated effective capacity increase without additional computational cost.
Abstract
Multilingual Machine Translation promises to improve translation quality between non-English languages. This is advantageous for several reasons, namely lower latency (no need to translate twice), and reduced error cascades (e.g., avoiding losing gender and formality information when translating through English). On the downside, adding more languages reduces model capacity per language, which is usually countered by increasing the overall model size, making training harder and inference slower. In this work, we introduce Language-Specific Transformer Layers (LSLs), which allow us to increase model capacity, while keeping the amount of computation and the number of parameters used in the forward pass constant. The key idea is to have some layers of the encoder be source or target language-specific, while keeping the remaining layers shared. We study the best way to place these layers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsAttention Is All You Need · Adam · Layer Normalization · Linear Layer · Label Smoothing · Dropout · Byte Pair Encoding · Multi-Head Attention · Dense Connections · Absolute Position Encodings
