Beat the long tail: Distribution-Aware Speculative Decoding for RL Training

Zelei Shao; Vikranth Srivatsa; Sanjana Srivastava; Qingyang Wu; Alpay Ariyak; Xiaoxia Wu; Ameen Patel; Jue Wang; Percy Liang; Tri Dao; Ce Zhang; Yiying Zhang; Ben Athiwaratkun; Chenfeng Xu; Junxiong Wang

arXiv:2511.13841·cs.LG·November 19, 2025

Beat the long tail: Distribution-Aware Speculative Decoding for RL Training

Zelei Shao, Vikranth Srivatsa, Sanjana Srivastava, Qingyang Wu, Alpay Ariyak, Xiaoxia Wu, Ameen Patel, Jue Wang, Percy Liang, Tri Dao, Ce Zhang, Yiying Zhang, Ben Athiwaratkun, Chenfeng Xu, Junxiong Wang

PDF

Open Access

TL;DR

This paper introduces DAS, a distribution-aware speculative decoding method that significantly accelerates reinforcement learning rollouts for large language models by leveraging rollout history and adaptive drafting, reducing time by up to 50%.

Contribution

DAS is a novel framework that uses rollout history and a length-aware policy to speed up RL training without changing model outputs.

Findings

01

Reduces rollout time by up to 50%.

02

Preserves training curves and model performance.

03

Effective on math and code reasoning tasks.

Abstract

Reinforcement learning(RL) post-training has become essential for aligning large language models (LLMs), yet its efficiency is increasingly constrained by the rollout phase, where long trajectories are generated token by token. We identify a major bottleneck:the long-tail distribution of rollout lengths, where a small fraction of long generations dominates wall clock time and a complementary opportunity; the availability of historical rollouts that reveal stable prompt level patterns across training epochs. Motivated by these observations, we propose DAS, a Distribution Aware Speculative decoding framework that accelerates RL rollouts without altering model outputs. DAS integrates two key ideas: an adaptive, nonparametric drafter built from recent rollouts using an incrementally maintained suffix tree, and a length aware speculation policy that allocates more aggressive draft budgets to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Algorithms