TL;DR
This paper introduces a dynamic multi-branch layer approach for on-device neural machine translation, significantly improving translation quality and speed while maintaining low resource consumption.
Contribution
It proposes a layer-wise dynamic multi-branch network with shared-private reparameterization, enabling efficient on-device NMT with enhanced performance.
Findings
Up to 1.7 BLEU improvement on WMT14 En-De
Up to 1.8 BLEU improvement on WMT20 Zh-En
Up to 1.5x faster inference with same parameters
Abstract
With the rapid development of artificial intelligence (AI), there is a trend in moving AI applications, such as neural machine translation (NMT), from cloud to mobile devices. Constrained by limited hardware resources and battery, the performance of on-device NMT systems is far from satisfactory. Inspired by conditional computation, we propose to improve the performance of on-device NMT systems with dynamic multi-branch layers. Specifically, we design a layer-wise dynamic multi-branch network with only one branch activated during training and inference. As not all branches are activated during training, we propose shared-private reparameterization to ensure sufficient training for each branch. At almost the same computational cost, our method achieves improvements of up to 1.7 BLEU points on the WMT14 English-German translation task and 1.8 BLEU points on the WMT20 Chinese-English…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dropout · Residual Connection · Dense Connections · Adam · Layer Normalization · Label Smoothing
