When Drafts Evolve: Speculative Decoding Meets Online Learning

Yu-Yang Qian; Hao-Cong Wu; Yichao Fu; Hao Zhang; Peng Zhao

arXiv:2603.12617·cs.LG·March 16, 2026

When Drafts Evolve: Speculative Decoding Meets Online Learning

Yu-Yang Qian, Hao-Cong Wu, Yichao Fu, Hao Zhang, Peng Zhao

PDF

Open Access

TL;DR

This paper introduces OnlineSpec, a framework that enhances speculative decoding for large language models by using online learning techniques to adapt draft models, resulting in significant speedups.

Contribution

It formalizes the connection between speculative decoding and online learning, proposing algorithms that improve model acceleration through adaptive feedback mechanisms.

Findings

01

Achieved up to 24% speedup on multiple benchmarks.

02

Developed algorithms with theoretical guarantees.

03

Demonstrated effective online adaptation of draft models.

Abstract

Speculative decoding has emerged as a widely adopted paradigm for accelerating large language model inference, where a lightweight draft model rapidly generates candidate tokens that are then verified in parallel by a larger target model. However, due to limited model capacity, drafts often struggle to approximate the target distribution, resulting in shorter acceptance lengths and diminished speedup. A key yet under-explored observation is that speculative decoding inherently provides verification feedback that quantifies the deviation between the draft and target models at no additional cost. This process naturally forms an iterative "draft commits-feedback provides-draft adapts" evolving loop, which precisely matches the online learning paradigm. Motivated by this connection, we propose OnlineSpec, a unified framework that systematically leverages interactive feedback to continuously…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Computational and Text Analysis Methods