A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models
Mahsa Khoshnoodi, Vinija Jain, Mingye Gao, Malavika Srikanth, Aman, Chadha

TL;DR
This survey reviews recent methods to accelerate text generation in large language models, focusing on techniques like speculative decoding, early exiting, and non-autoregressive approaches to reduce inference latency.
Contribution
It categorizes and analyzes key acceleration techniques in autoregressive LLMs, providing insights and guidance for future research in efficient text generation.
Findings
Speculative decoding significantly reduces inference time.
Early exiting mechanisms improve efficiency with minimal accuracy loss.
Non-autoregressive methods offer promising speedups for large models.
Abstract
Despite the crucial importance of accelerating text generation in large language models (LLMs) for efficiently producing content, the sequential nature of this process often leads to high inference latency, posing challenges for real-time applications. Various techniques have been proposed and developed to address these challenges and improve efficiency. This paper presents a comprehensive survey of accelerated generation techniques in autoregressive language models, aiming to understand the state-of-the-art methods and their applications. We categorize these techniques into several key areas: speculative decoding, early exiting mechanisms, and non-autoregressive methods. We discuss each category's underlying principles, advantages, limitations, and recent advancements. Through this survey, we aim to offer insights into the current landscape of techniques in LLMs and provide guidance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis
MethodsEarly exiting using confidence measures
