BubbleSpec: Turning Long-Tail Bubbles into Speculative Rollout Drafts for Synchronous Reinforcement Learning
Yuhang Xu, Kaibin Tian, Yang Tian, Zhice Yang, Yifeng Yu, Yan Li, Shengzhong Liu, Fan Wu, Guihai Chen

TL;DR
BubbleSpec introduces a method to exploit long-tail bubbles in synchronous RL for LLMs, significantly accelerating rollouts without sacrificing algorithmic exactness.
Contribution
It proposes BubbleSpec, a framework that leverages idle GPU time to pre-generate rollout drafts, enabling immediate acceleration while maintaining strict synchronization.
Findings
Reduces decoding steps by 50%.
Increases rollout throughput by up to 1.8x.
Compatible with various RL frameworks and strategies.
Abstract
Reinforcement Learning (RL) has become a cornerstone for improving the performance of Large Language Models (LLMs). However, its rollout phase constitutes a significant efficiency bottleneck, mainly arising from the long-tail bubbles across data parallel ranks, particularly in long-context scenarios where faster GPUs remain idle while waiting for stragglers. Existing solutions, such as partial rollout or asynchronous RL, mitigate these bubbles by compromising the algorithm's strict synchronous nature. Instead, we propose BubbleSpec, a novel framework that accelerates RL rollouts while strictly keeping the mathematical exactness. Instead of attempting to eliminate bubbles, BubbleSpec exploits them. We exploit the idle time windows of faster ranks to pre-generate rollout results for subsequent steps, serving as drafts for speculative decoding. Unlike prior speculative methods that rely on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
