Dynamic Multi-Branch Layers for On-Device Neural Machine Translation

Zhixing Tan; Zeyuan Yang; Meng Zhang; Qun Liu; Maosong Sun; Yang Liu

arXiv:2105.06679·cs.CL·March 18, 2022

Dynamic Multi-Branch Layers for On-Device Neural Machine Translation

Zhixing Tan, Zeyuan Yang, Meng Zhang, Qun Liu, Maosong Sun, Yang Liu

PDF

1 Repo

TL;DR

This paper introduces a dynamic multi-branch layer approach for on-device neural machine translation, significantly improving translation quality and speed while maintaining low resource consumption.

Contribution

It proposes a layer-wise dynamic multi-branch network with shared-private reparameterization, enabling efficient on-device NMT with enhanced performance.

Findings

01

Up to 1.7 BLEU improvement on WMT14 En-De

02

Up to 1.8 BLEU improvement on WMT20 Zh-En

03

Up to 1.5x faster inference with same parameters

Abstract

With the rapid development of artificial intelligence (AI), there is a trend in moving AI applications, such as neural machine translation (NMT), from cloud to mobile devices. Constrained by limited hardware resources and battery, the performance of on-device NMT systems is far from satisfactory. Inspired by conditional computation, we propose to improve the performance of on-device NMT systems with dynamic multi-branch layers. Specifically, we design a layer-wise dynamic multi-branch network with only one branch activated during training and inference. As not all branches are activated during training, we propose shared-private reparameterization to ensure sufficient training for each branch. At almost the same computational cost, our method achieves improvements of up to 1.7 BLEU points on the WMT14 English-German translation task and 1.8 BLEU points on the WMT20 Chinese-English…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thunlp-mt/transformer-dmb
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dropout · Residual Connection · Dense Connections · Adam · Layer Normalization · Label Smoothing