Training Flexible Depth Model by Multi-Task Learning for Neural Machine   Translation

Qiang Wang; Tong Xiao; Jingbo Zhu

arXiv:2010.08265·cs.CL·October 19, 2020·1 cites

Training Flexible Depth Model by Multi-Task Learning for Neural Machine Translation

Qiang Wang, Tong Xiao, Jingbo Zhu

PDF

Open Access

TL;DR

This paper introduces a multi-task learning approach to train a neural machine translation model that can adapt to various network depths during inference, reducing maintenance and deployment costs across different devices.

Contribution

The authors propose a novel multi-task learning method enabling a single model to support multiple depth configurations for neural machine translation.

Findings

01

Supports 24 depth configurations simultaneously

02

Outperforms individual training methods

03

Superior to LayerDrop in flexibility and performance

Abstract

The standard neural machine translation model can only decode with the same depth configuration as training. Restricted by this feature, we have to deploy models of various sizes to maintain the same translation latency, because the hardware conditions on different terminal devices (e.g., mobile phones) may vary greatly. Such individual training leads to increased model maintenance costs and slower model iterations, especially for the industry. In this work, we propose to use multi-task learning to train a flexible depth model that can adapt to different depth configurations during inference. Experimental results show that our approach can simultaneously support decoding in 24 depth configurations and is superior to the individual training and another flexible depth model training method -- LayerDrop.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis

MethodsLayerDrop