Dynamic Speculation Lookahead Accelerates Speculative Decoding of Large   Language Models

Jonathan Mamou; Oren Pereg; Daniel Korat; Moshe Berchansky; and Nadav Timor; Moshe Wasserblat; Roy Schwartz

arXiv:2405.04304·cs.CL·November 8, 2024·1 cites

Dynamic Speculation Lookahead Accelerates Speculative Decoding of Large Language Models

Jonathan Mamou, Oren Pereg, Daniel Korat, Moshe Berchansky, and Nadav Timor, Moshe Wasserblat, Roy Schwartz

PDF

Open Access

TL;DR

This paper introduces DISCO, a dynamic method for selecting speculation lookahead in large language models, significantly improving inference speed without sacrificing output quality.

Contribution

DISCO is the first approach to dynamically optimize speculation lookahead, outperforming static methods in large language model decoding.

Findings

01

Achieves 10% average speedup over static lookahead methods

02

Maintains identical text output to static methods

03

Demonstrates effectiveness across four datasets

Abstract

Speculative decoding is commonly used for reducing the inference latency of large language models. Its effectiveness depends highly on the speculation lookahead (SL)-the number of tokens generated by the draft model at each iteration. In this work we show that the common practice of using the same SL for all iterations (static SL) is suboptimal. We introduce DISCO (DynamIc SpeCulation lookahead Optimization), a novel method for dynamically selecting the SL. Our experiments with four datasets show that DISCO reaches an average speedup of 10% compared to the best static SL baseline, while generating the exact same text.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsError Correcting Code Techniques · Advanced Data Compression Techniques · Chaos-based Image/Signal Encryption