Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter
Qinghao Hu, Shang Yang, Junxian Guo, Xiaozhe Yao, Yujun Lin, Yuxian Gu, Han Cai, Chuang Gan, Ana Klimovic, Song Han

TL;DR
This paper introduces TLT, a system that significantly accelerates reasoning RL training by using adaptive speculative decoding, reducing training time by over 1.7 times while maintaining model accuracy.
Contribution
The paper presents an innovative adaptive speculative decoding approach with a lightweight draft model and memory-efficient rollout engine to improve RL training efficiency.
Findings
Achieves over 1.7x speedup in RL training
Maintains model accuracy comparable to state-of-the-art
Produces a high-quality draft model for deployment
Abstract
The emergence of Large Language Models (LLMs) with strong reasoning capabilities marks a significant milestone, unlocking new frontiers in complex problem-solving. However, training these reasoning models, typically using Reinforcement Learning (RL), encounters critical efficiency bottlenecks: response generation during RL training exhibits a persistent long-tail distribution, where a few very long responses dominate execution time, wasting resources and inflating costs. To address this, we propose TLT, a system that accelerates reasoning RL training losslessly by integrating adaptive speculative decoding. Applying speculative decoding in RL is challenging due to the dynamic workloads, evolving target model, and draft model training overhead. TLT overcomes these obstacles with two synergistic components: (1) Adaptive Drafter, a lightweight draft model trained continuously on idle GPUs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
ASPLOS'26 - Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter· youtube
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics
