Test-Time Speculation

Avinash Kumar; Sujay Sanghavi; Poulami Das

arXiv:2605.09329·cs.CL·May 20, 2026

Test-Time Speculation

Avinash Kumar, Sujay Sanghavi, Poulami Das

PDF

TL;DR

Test-Time Speculation (TTS) is an online distillation method that adaptively improves speculative decoding for long-response tasks by leveraging test-time feedback, significantly enhancing acceptance lengths.

Contribution

The paper introduces TTS, a novel online adaptation technique that improves speculative decoding by continuously updating the draft model during inference.

Findings

01

TTS increases acceptance lengths by up to 72% over state-of-the-art speculators.

02

Acceptance lengths decline with generation length in existing methods, limiting long-response performance.

03

TTS maintains higher acceptance lengths across multiple models and longer outputs.

Abstract

Speculative decoding accelerates LLM inference by using a fast draft model to generate tokens and a more accurate target model to verify them. Its performance depends on the $acceptance length$ , or number of draft tokens accepted by the target. Our studies show that the acceptance length of even state-of-the-art speculators, like DFlash, EAGLE-3 and PARD degrade with generation length, reaching values close to 1 (i.e. no speedup) within just a few thousand output tokens, making speculators ineffective for long-response tasks. Acceptance lengths decline because most speculators are trained offline on short sequences, but are forced to match the target model on much longer outputs at inference, well beyond their training distribution. To address this issue, we propose $Test-Time Speculation (TTS)$ , an online distillation approach that continuously adapts the speculator…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.