Training Flexible Depth Model by Multi-Task Learning for Neural Machine Translation
Qiang Wang, Tong Xiao, Jingbo Zhu

TL;DR
This paper introduces a multi-task learning approach to train a neural machine translation model that can adapt to various network depths during inference, reducing maintenance and deployment costs across different devices.
Contribution
The authors propose a novel multi-task learning method enabling a single model to support multiple depth configurations for neural machine translation.
Findings
Supports 24 depth configurations simultaneously
Outperforms individual training methods
Superior to LayerDrop in flexibility and performance
Abstract
The standard neural machine translation model can only decode with the same depth configuration as training. Restricted by this feature, we have to deploy models of various sizes to maintain the same translation latency, because the hardware conditions on different terminal devices (e.g., mobile phones) may vary greatly. Such individual training leads to increased model maintenance costs and slower model iterations, especially for the industry. In this work, we propose to use multi-task learning to train a flexible depth model that can adapt to different depth configurations during inference. Experimental results show that our approach can simultaneously support decoding in 24 depth configurations and is superior to the individual training and another flexible depth model training method -- LayerDrop.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
MethodsLayerDrop
