The RoyalFlush System for the WMT 2022 Efficiency Task

Bo Qin; Aixin Jia; Qiang Wang; Jianning Lu; Shuqin Pan; Haibo Wang,; Ming Chen

arXiv:2212.01543·cs.CL·December 6, 2022

The RoyalFlush System for the WMT 2022 Efficiency Task

Bo Qin, Aixin Jia, Qiang Wang, Jianning Lu, Shuqin Pan, Haibo Wang,, Ming Chen

PDF

Open Access

TL;DR

The paper introduces the RoyalFlush system for WMT 2022, employing a hybrid two-stage translation method that balances speed and quality, achieving significant inference speed improvements over previous systems.

Contribution

It proposes Hybrid Regression Translation (HRT), a novel two-stage translation approach combining autoregressive and non-autoregressive methods, with techniques to optimize speed and performance.

Findings

01

Achieves 80% faster inference speed

02

Maintains translation quality comparable to autoregressive models

03

Reaches over 6,000 words per second on GPU

Abstract

This paper describes the submission of the RoyalFlush neural machine translation system for the WMT 2022 translation efficiency task. Unlike the commonly used autoregressive translation system, we adopted a two-stage translation paradigm called Hybrid Regression Translation (HRT) to combine the advantages of autoregressive and non-autoregressive translation. Specifically, HRT first autoregressively generates a discontinuous sequence (e.g., make a prediction every $k$ tokens, $k > 1$ ) and then fills in all previously skipped tokens at once in a non-autoregressive manner. Thus, we can easily trade off the translation quality and speed by adjusting $k$ . In addition, by integrating other modeling techniques (e.g., sequence-level knowledge distillation and deep-encoder-shallow-decoder layer allocation strategy) and a mass of engineering efforts, HRT improves 80\% inference speed and achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Cancer-related molecular mechanisms research

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Knowledge Distillation