Scaling Speculative Decoding with Lookahead Reasoning

Yichao Fu; Rui Ge; Zelei Shao; Zhijie Deng; Hao Zhang

arXiv:2506.19830·cs.LG·June 25, 2025

Scaling Speculative Decoding with Lookahead Reasoning

Yichao Fu, Rui Ge, Zelei Shao, Zhijie Deng, Hao Zhang

PDF

Open Access

TL;DR

Lookahead Reasoning enhances speculative decoding by introducing a step-level parallelism layer, significantly increasing decoding speed while maintaining answer quality across various benchmarks.

Contribution

This paper introduces Lookahead Reasoning, a novel method that combines step-level and token-level parallelism to surpass existing speculative decoding speed limits.

Findings

01

Speeds up speculative decoding from 1.4x to 2.1x.

02

Maintains answer quality across multiple benchmarks.

03

Scales better with additional GPU throughput.

Abstract

Reasoning models excel by generating long chain-of-thoughts, but decoding the resulting thousands of tokens is slow. Token-level speculative decoding (SD) helps, but its benefit is capped, because the chance that an entire $γ$ -token guess is correct falls exponentially as $γ$ grows. This means allocating more compute for longer token drafts faces an algorithmic ceiling -- making the speedup modest and hardware-agnostic. We raise this ceiling with Lookahead Reasoning, which exploits a second, step-level layer of parallelism. Our key insight is that reasoning models generate step-by-step, and each step needs only to be semantically correct, not exact token matching. In Lookahead Reasoning, a lightweight draft model proposes several future steps; the target model expands each proposal in one batched pass, and a verifier keeps semantically correct steps while letting the target…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLogic, Reasoning, and Knowledge · Advanced Algebra and Logic · Semantic Web and Ontologies