The RoyalFlush System for the WMT 2022 Efficiency Task
Bo Qin, Aixin Jia, Qiang Wang, Jianning Lu, Shuqin Pan, Haibo Wang,, Ming Chen

TL;DR
The paper introduces the RoyalFlush system for WMT 2022, employing a hybrid two-stage translation method that balances speed and quality, achieving significant inference speed improvements over previous systems.
Contribution
It proposes Hybrid Regression Translation (HRT), a novel two-stage translation approach combining autoregressive and non-autoregressive methods, with techniques to optimize speed and performance.
Findings
Achieves 80% faster inference speed
Maintains translation quality comparable to autoregressive models
Reaches over 6,000 words per second on GPU
Abstract
This paper describes the submission of the RoyalFlush neural machine translation system for the WMT 2022 translation efficiency task. Unlike the commonly used autoregressive translation system, we adopted a two-stage translation paradigm called Hybrid Regression Translation (HRT) to combine the advantages of autoregressive and non-autoregressive translation. Specifically, HRT first autoregressively generates a discontinuous sequence (e.g., make a prediction every tokens, ) and then fills in all previously skipped tokens at once in a non-autoregressive manner. Thus, we can easily trade off the translation quality and speed by adjusting . In addition, by integrating other modeling techniques (e.g., sequence-level knowledge distillation and deep-encoder-shallow-decoder layer allocation strategy) and a mass of engineering efforts, HRT improves 80\% inference speed and achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Cancer-related molecular mechanisms research
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Knowledge Distillation
