Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter

Qinghao Hu; Shang Yang; Junxian Guo; Xiaozhe Yao; Yujun Lin; Yuxian Gu; Han Cai; Chuang Gan; Ana Klimovic; Song Han

arXiv:2511.16665·cs.LG·March 23, 2026

Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter

Qinghao Hu, Shang Yang, Junxian Guo, Xiaozhe Yao, Yujun Lin, Yuxian Gu, Han Cai, Chuang Gan, Ana Klimovic, Song Han

PDF

Open Access 1 Video

TL;DR

This paper introduces TLT, a system that significantly accelerates reasoning RL training by using adaptive speculative decoding, reducing training time by over 1.7 times while maintaining model accuracy.

Contribution

The paper presents an innovative adaptive speculative decoding approach with a lightweight draft model and memory-efficient rollout engine to improve RL training efficiency.

Findings

01

Achieves over 1.7x speedup in RL training

02

Maintains model accuracy comparable to state-of-the-art

03

Produces a high-quality draft model for deployment

Abstract

The emergence of Large Language Models (LLMs) with strong reasoning capabilities marks a significant milestone, unlocking new frontiers in complex problem-solving. However, training these reasoning models, typically using Reinforcement Learning (RL), encounters critical efficiency bottlenecks: response generation during RL training exhibits a persistent long-tail distribution, where a few very long responses dominate execution time, wasting resources and inflating costs. To address this, we propose TLT, a system that accelerates reasoning RL training losslessly by integrating adaptive speculative decoding. Applying speculative decoding in RL is challenging due to the dynamic workloads, evolving target model, and draft model training overhead. TLT overcomes these obstacles with two synergistic components: (1) Adaptive Drafter, a lightweight draft model trained continuously on idle GPUs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

ASPLOS'26 - Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter· youtube

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics