Speculative Sampling with Reinforcement Learning

Chenan Wang; Daniel H. Shi; Haipeng Chen

arXiv:2601.12212·cs.LG·January 21, 2026

Speculative Sampling with Reinforcement Learning

Chenan Wang, Daniel H. Shi, Haipeng Chen

PDF

Open Access

TL;DR

This paper introduces Re-SpS, a reinforcement learning framework that dynamically optimizes speculative sampling hyperparameters in large language models, significantly improving inference speed without sacrificing output quality.

Contribution

Re-SpS is the first RL-based method for adaptive hyperparameter tuning in speculative sampling, enhancing efficiency across diverse contexts.

Findings

01

Achieves up to 5.45× speedup over backbone LLM.

02

Up to 1.12× speedup over SOTA EAGLE-3.

03

Maintains output fidelity across benchmarks.

Abstract

Inference time latency has remained an open challenge for real world applications of large language models (LLMs). State-of-the-art (SOTA) speculative sampling (SpS) methods for LLMs, like EAGLE-3, use tree-based drafting to explore multiple candidate continuations in parallel. However, the hyperparameters controlling the tree structure are static, which limits flexibility and efficiency across diverse contexts and domains. We introduce Reinforcement learning for Speculative Sampling (Re-SpS), the first reinforcement learning (RL)-based framework for draft tree hyperparameter optimization. Re-SpS dynamically adjusts draft tree hyperparameters in real-time, learning context-aware policies that maximize generation speed by balancing speculative aggression with computational overhead. It leverages efficient state representations from target model hidden states and introduces multi-step…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Natural Language Processing Techniques