Improving Zero-shot Neural Machine Translation on Language-specific   Encoders-Decoders

Junwei Liao; Yu Shi; Ming Gong; Linjun Shou; Hong Qu; Michael Zeng

arXiv:2102.06578·cs.CL·February 15, 2021

Improving Zero-shot Neural Machine Translation on Language-specific Encoders-Decoders

Junwei Liao, Yu Shi, Ming Gong, Linjun Shou, Hong Qu, Michael Zeng

PDF

Open Access

TL;DR

This paper enhances zero-shot translation in language-specific encoder-decoder models by differentiating Transformer layers, sharing parameters selectively, and using a denoising auto-encoding objective, achieving competitive results with universal NMT.

Contribution

It introduces a novel architecture differentiating Transformer layers for language-specific and interlingua representations, improving zero-shot translation performance.

Findings

01

Achieves better zero-shot translation results than baseline models.

02

Enables incremental addition of new languages with minimal retraining.

03

Outperforms strong pivot baseline in experiments.

Abstract

Recently, universal neural machine translation (NMT) with shared encoder-decoder gained good performance on zero-shot translation. Unlike universal NMT, jointly trained language-specific encoders-decoders aim to achieve universal representation across non-shared modules, each of which is for a language or language family. The non-shared architecture has the advantage of mitigating internal language competition, especially when the shared vocabulary and model parameters are restricted in their size. However, the performance of using multiple encoders and decoders on zero-shot translation still lags behind universal NMT. In this work, we study zero-shot translation using language-specific encoders-decoders. We propose to generalize the non-shared architecture and universal NMT by differentiating the Transformer layers between language-specific and interlingua. By selectively sharing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Attention Is All You Need · Softmax · Label Smoothing · Byte Pair Encoding · Dropout · Residual Connection · Multi-Head Attention