A Comparison of Transformer and Recurrent Neural Networks on Multilingual Neural Machine Translation
Surafel M. Lakew, Mauro Cettolo, Marcello Federico

TL;DR
This paper compares Transformer and Recurrent neural network architectures in multilingual neural machine translation, analyzing their translation quality, zero-shot capabilities, and the influence of language similarity using professional post-edits and standard metrics.
Contribution
It provides a detailed comparative analysis of Transformer and Recurrent models in multilingual NMT, including zero-shot translation and language proximity effects.
Findings
Transformers outperform Recurrent models in translation quality.
Zero-shot translation effectiveness depends on language similarity.
Multilingual models benefit from shared representations across languages.
Abstract
Recently, neural machine translation (NMT) has been extended to multilinguality, that is to handle more than one translation direction with a single system. Multilingual NMT showed competitive performance against pure bilingual systems. Notably, in low-resource settings, it proved to work effectively and efficiently, thanks to shared representation space that is forced across languages and induces a sort of transfer-learning. Furthermore, multilingual NMT enables so-called zero-shot inference across language pairs never seen at training time. Despite the increasing interest in this framework, an in-depth analysis of what a multilingual NMT model is capable of and what it is not is still missing. Motivated by this, our work (i) provides a quantitative and comparative analysis of the translations produced by bilingual, multilingual and zero-shot systems; (ii) investigates the translation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax
