LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation

Tianyu Liu; Qitan Lv; Hao Li; Xing Gao; Xiao Sun; Xiaoyan Sun

arXiv:2507.01449·cs.CL·April 30, 2026

LogitSpec: Accelerating Retrieval-based Speculative Decoding via Next Next Token Speculation

Tianyu Liu, Qitan Lv, Hao Li, Xing Gao, Xiao Sun, Xiaoyan Sun

PDF

1 Repo

TL;DR

LogitSpec enhances retrieval-based speculative decoding for large language models by using logit-based speculation of subsequent tokens, leading to significant inference speedups and improved token acceptance rates.

Contribution

It introduces a training-free, plug-and-play method that expands retrieval range by speculating the next next token using last logit, improving decoding efficiency.

Findings

01

Achieves up to 2.61× speedup in inference

02

Increases mean accepted tokens per decoding step to 3.28

03

Demonstrates effectiveness across various text generation benchmarks

Abstract

Speculative decoding (SD), where a small draft model is employed to propose draft tokens in advance and then the target model validates them in parallel, has emerged as a promising technique for LLM inference acceleration. Many endeavors to improve SD are to eliminate the need for a draft model and generate draft tokens in a retrieval-based manner in order to further alleviate the drafting overhead and significantly reduce the difficulty in deployment and applications. However, retrieval-based SD relies on a matching paradigm to retrieval the most relevant reference as the draft tokens, where these methods often fail to find matched and accurate draft tokens. To address this challenge, we propose LogitSpec to effectively expand the retrieval range and find the most relevant reference as drafts. Our LogitSpec is motivated by the observation that the logit of the last token can not only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

smart-lty/LogitSpec
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.