Improving Zero-shot Neural Machine Translation on Language-specific Encoders-Decoders
Junwei Liao, Yu Shi, Ming Gong, Linjun Shou, Hong Qu, Michael Zeng

TL;DR
This paper enhances zero-shot translation in language-specific encoder-decoder models by differentiating Transformer layers, sharing parameters selectively, and using a denoising auto-encoding objective, achieving competitive results with universal NMT.
Contribution
It introduces a novel architecture differentiating Transformer layers for language-specific and interlingua representations, improving zero-shot translation performance.
Findings
Achieves better zero-shot translation results than baseline models.
Enables incremental addition of new languages with minimal retraining.
Outperforms strong pivot baseline in experiments.
Abstract
Recently, universal neural machine translation (NMT) with shared encoder-decoder gained good performance on zero-shot translation. Unlike universal NMT, jointly trained language-specific encoders-decoders aim to achieve universal representation across non-shared modules, each of which is for a language or language family. The non-shared architecture has the advantage of mitigating internal language competition, especially when the shared vocabulary and model parameters are restricted in their size. However, the performance of using multiple encoders and decoders on zero-shot translation still lags behind universal NMT. In this work, we study zero-shot translation using language-specific encoders-decoders. We propose to generalize the non-shared architecture and universal NMT by differentiating the Transformer layers between language-specific and interlingua. By selectively sharing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Attention Is All You Need · Softmax · Label Smoothing · Byte Pair Encoding · Dropout · Residual Connection · Multi-Head Attention
