Joint-training on Symbiosis Networks for Deep Nueral Machine Translation models
Zhengzhe Yu, Jiaxin Guo, Minghan Wang, Daimeng Wei, Hengchao Shang,, Zongyao Li, Zhanglin Wu, Yuxia Wang, Yimeng Chen, Chang Su, Min Zhang, Lizhi, Lei, shimin tao, Hao Yang

TL;DR
This paper introduces Symbiosis Networks for neural machine translation, enabling deeper models with better performance and efficiency through joint training of main and sub-networks with regularization.
Contribution
It proposes a novel Symbiosis Network architecture and training method that enhances deep Transformer models for NMT, surpassing traditional training limits.
Findings
Improved BLEU scores on WMT'14 EN->DE, DE->EN, EN->FR tasks.
Transformer-deep (12-6) outperforms Transformer-deep (18-6).
Efficient training of deeper NMT models with regularization.
Abstract
Deep encoders have been proven to be effective in improving neural machine translation (NMT) systems, but it reaches the upper bound of translation quality when the number of encoder layers exceeds 18. Worse still, deeper networks consume a lot of memory, making it impossible to train efficiently. In this paper, we present Symbiosis Networks, which include a full network as the Symbiosis Main Network (M-Net) and another shared sub-network with the same structure but less layers as the Symbiotic Sub Network (S-Net). We adopt Symbiosis Networks on Transformer-deep (m-n) architecture and define a particular regularization loss between the M-Net and S-Net in NMT. We apply joint-training on the Symbiosis Networks and aim to improve the M-Net performance. Our proposed training strategy improves Transformer-deep (12-6) by 0.61, 0.49 and 0.69 BLEU over the baselines under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
