Language Modeling, Lexical Translation, Reordering: The Training Process of NMT through the Lens of Classical SMT
Elena Voita, Rico Sennrich, Ivan Titov

TL;DR
This paper investigates how neural machine translation models develop different translation skills during training, revealing a progression from language modeling to reordering, and demonstrates practical applications of this understanding.
Contribution
It provides a detailed analysis of NMT training dynamics, linking them to traditional SMT components, and shows how this insight can improve non-autoregressive NMT.
Findings
NMT first learns target language modeling.
Subsequently improves word-by-word translation.
Finally masters complex reordering patterns.
Abstract
Differently from the traditional statistical MT that decomposes the translation task into distinct separately learned components, neural machine translation uses a single neural network to model the entire translation process. Despite neural machine translation being de-facto standard, it is still not clear how NMT models acquire different competences over the course of training, and how this mirrors the different models in traditional SMT. In this work, we look at the competences related to three core SMT components and find that during training, NMT first focuses on learning target-side language modeling, then improves translation quality approaching word-by-word translation, and finally learns more complicated reordering patterns. We show that this behavior holds for several models and language pairs. Additionally, we explain how such an understanding of the training process can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
