Depth Growing for Neural Machine Translation
Lijun Wu, Yiren Wang, Yingce Xia, Fei Tian, Fei Gao, Tao Qin,, Jianhuang Lai, Tie-Yan Liu

TL;DR
This paper introduces a two-stage method with three components to effectively increase the depth of neural machine translation models, leading to significant translation quality improvements over strong Transformer baselines.
Contribution
The paper proposes a novel two-stage approach with three components to enable deeper NMT models, overcoming the limitations of simply stacking layers.
Findings
Deeper NMT models outperform shallower baselines.
The approach significantly improves translation quality on WMT14 tasks.
The method maintains performance without degradation when increasing depth.
Abstract
While very deep neural networks have shown effectiveness for computer vision and text classification applications, how to increase the network depth of neural machine translation (NMT) models for better translation quality remains a challenging problem. Directly stacking more blocks to the NMT model results in no improvement and even reduces performance. In this work, we propose an effective two-stage approach with three specially designed components to construct deeper NMT models, which result in significant improvements over the strong Transformer baselines on WMT EnglishGerman and EnglishFrench translation tasks\footnote{Our code is available at \url{https://github.com/apeterswu/Depth_Growing_NMT}}.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax
