Lossless Acceleration for Seq2seq Generation with Aggressive Decoding
Tao Ge, Heming Xia, Xin Sun, Si-Qing Chen, Furu Wei

TL;DR
This paper introduces Aggressive Decoding, a novel parallel decoding algorithm for seq2seq tasks that achieves significant speedups without quality loss by combining aggressive decoding with verification, applicable to various tasks.
Contribution
The paper proposes two new Aggressive Decoding paradigms for different seq2seq tasks, enabling lossless acceleration with parallel computing, outperforming previous methods in speed while maintaining quality.
Findings
7x-9x speedup in grammatical error correction and text simplification
3x-5x speedup in machine translation and summarization
Achieves identical or better quality compared to autoregressive decoding
Abstract
We study lossless acceleration for seq2seq generation with a novel decoding algorithm -- Aggressive Decoding. Unlike the previous efforts (e.g., non-autoregressive decoding) speeding up seq2seq generation at the cost of quality loss, our approach aims to yield the identical (or better) generation compared with autoregressive decoding but in a significant speedup, achieved by innovative cooperation of aggressive decoding and verification that are both efficient due to parallel computing. We propose two Aggressive Decoding paradigms for 2 kinds of seq2seq tasks: 1) For the seq2seq tasks whose inputs and outputs are highly similar (e.g., Grammatical Error Correction), we propose Input-guided Aggressive Decoding (IAD) that aggressively copies from the input sentence as drafted decoded tokens to verify in parallel; 2) For other general seq2seq tasks (e.g., Machine Translation), we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Genomics and Phylogenetic Studies · Advanced Data Storage Technologies
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Layer Normalization · Byte Pair Encoding · Dense Connections · Dropout · Sigmoid Activation · Absolute Position Encodings
