Learning Language-Specific Layers for Multilingual Machine Translation

Telmo Pessoa Pires; Robin M. Schmidt; Yi-Hsiu Liao; Stephan Peitz

arXiv:2305.02665·cs.CL·May 5, 2023·1 cites

Learning Language-Specific Layers for Multilingual Machine Translation

Telmo Pessoa Pires, Robin M. Schmidt, Yi-Hsiu Liao, Stephan Peitz

PDF

Open Access

TL;DR

This paper introduces Language-Specific Transformer Layers (LSLs) for multilingual machine translation, enhancing capacity without increasing computation, leading to significant translation quality improvements across multiple languages.

Contribution

The paper proposes a novel method of adding language-specific layers within Transformer models, optimized via neural architecture search, to improve multilingual translation performance.

Findings

01

Achieved 1.3 chrF and 1.5 spBLEU improvements with separate decoder architecture.

02

Achieved 1.9 chrF and 2.2 spBLEU improvements with shared decoder architecture.

03

Demonstrated effective capacity increase without additional computational cost.

Abstract

Multilingual Machine Translation promises to improve translation quality between non-English languages. This is advantageous for several reasons, namely lower latency (no need to translate twice), and reduced error cascades (e.g., avoiding losing gender and formality information when translating through English). On the downside, adding more languages reduces model capacity per language, which is usually countered by increasing the overall model size, making training harder and inference slower. In this work, we introduce Language-Specific Transformer Layers (LSLs), which allow us to increase model capacity, while keeping the amount of computation and the number of parameters used in the forward pass constant. The key idea is to have some layers of the encoder be source or target language-specific, while keeping the remaining layers shared. We study the best way to place these layers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications

MethodsAttention Is All You Need · Adam · Layer Normalization · Linear Layer · Label Smoothing · Dropout · Byte Pair Encoding · Multi-Head Attention · Dense Connections · Absolute Position Encodings