SRT: Accelerating Reinforcement Learning via Speculative Rollout with Tree-Structured Cache

Chi-Chih Chang; Siqi Zhu; Zhichen Zeng; Haibin Lin; Jiaxuan You; Mohamed S. Abdelfattah; Ziheng Jiang; Xuehai Qian

arXiv:2601.09083·cs.LG·January 15, 2026

SRT: Accelerating Reinforcement Learning via Speculative Rollout with Tree-Structured Cache

Chi-Chih Chang, Siqi Zhu, Zhichen Zeng, Haibin Lin, Jiaxuan You, Mohamed S. Abdelfattah, Ziheng Jiang, Xuehai Qian

PDF

Open Access

TL;DR

SRT is a model-free method that accelerates reinforcement learning for language models by using a tree-structured cache to perform speculative decoding, reducing latency and inference costs without losing correctness.

Contribution

SRT introduces a novel tree-structured cache for speculative rollout, enhancing RL training speed and efficiency in language models without compromising distributional accuracy.

Findings

01

Achieves up to 2.08x speedup in rollout time.

02

Reduces per-token inference cost.

03

Compatible with standard RL pipelines.

Abstract

We present Speculative Rollout with Tree-Structured Cache (SRT), a simple, model-free approach to accelerate on-policy reinforcement learning (RL) for language models without sacrificing distributional correctness. SRT exploits the empirical similarity of rollouts for the same prompt across training steps by storing previously generated continuations in a per-prompt tree-structured cache. During generation, the current policy uses this tree as the draft model for performing speculative decoding. To keep the cache fresh and improve draft model quality, SRT updates trees online from ongoing rollouts and proactively performs run-ahead generation during idle GPU bubbles. Integrated into standard RL pipelines (\textit{e.g.}, PPO, GRPO and DAPO) and multi-turn settings, SRT consistently reduces generation and step latency and lowers per-token inference cost, achieving up to 2.08x wall-clock…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Topic Modeling · Multimodal Machine Learning Applications