ReSpec: Towards Optimizing Speculative Decoding in Reinforcement Learning Systems

Qiaoling Chen; Zijun Liu; Peng Sun; Shenggui Li; Guoteng Wang; Ziming Liu; Yonggang Wen; Siyuan Feng; Tianwei Zhang

arXiv:2510.26475·cs.LG·October 31, 2025

ReSpec: Towards Optimizing Speculative Decoding in Reinforcement Learning Systems

Qiaoling Chen, Zijun Liu, Peng Sun, Shenggui Li, Guoteng Wang, Ziming Liu, Yonggang Wen, Siyuan Feng, Tianwei Zhang

PDF

TL;DR

ReSpec is a system that optimizes speculative decoding in reinforcement learning for large language models, significantly speeding up training without sacrificing reward quality or stability.

Contribution

ReSpec introduces dynamic tuning, knowledge distillation, and reward-weighted updates to effectively integrate speculative decoding into RL training of LLMs.

Findings

01

Achieves up to 4.5x training speedup on Qwen models

02

Maintains reward convergence and training stability

03

Addresses key challenges of speculative decoding in RL systems

Abstract

Adapting large language models (LLMs) via reinforcement learning (RL) is often bottlenecked by the generation stage, which can consume over 75\% of the training time. Speculative decoding (SD) accelerates autoregressive generation in serving systems, but its behavior under RL training remains largely unexplored. We identify three critical gaps that hinder the naive integration of SD into RL systems: diminishing speedups at large batch sizes, drafter staleness under continual actor updates, and drafter-induced policy degradation. To address these gaps, we present ReSpec, a system that adapts SD to RL through three complementary mechanisms: dynamically tuning SD configurations, evolving the drafter via knowledge distillation, and weighting updates by rollout rewards. On Qwen models (3B--14B), ReSpec achieves up to 4.5x speedup while preserving reward convergence and training stability,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.