History Rhymes: Accelerating LLM Reinforcement Learning with RhymeRL
Jingkai He, Tianjian Li, Erhu Feng, Dong Du, Qian Liu, Tao Liu, Yubin Xia, Haibo Chen

TL;DR
RhymeRL accelerates large language model reinforcement learning by leveraging historical response similarities to improve rollout efficiency and workload balancing, achieving significant performance gains without accuracy loss.
Contribution
The paper introduces RhymeRL, a novel RL system for LLMs that uses historical similarity-based inference and scheduling to enhance training speed and GPU utilization.
Findings
Achieves 2.6x performance improvement over existing methods.
Maintains training accuracy without paradigm modifications.
Scales efficiently from dozens to thousands of GPUs.
Abstract
With the rapid advancement of large language models (LLMs), reinforcement learning (RL) has emerged as a pivotal methodology for enhancing the reasoning capabilities of LLMs. Unlike traditional pre-training approaches, RL encompasses multiple stages: rollout, reward, and training, which necessitates collaboration among various worker types. However, current RL systems continue to grapple with substantial GPU underutilization, due to two primary factors: (1) The rollout stage dominates the overall RL process due to test-time scaling; (2) Imbalances in rollout lengths (within the same batch) result in GPU bubbles. While prior solutions like asynchronous execution and truncation offer partial relief, they may compromise training accuracy for efficiency. Our key insight stems from a previously overlooked observation: rollout responses exhibit remarkable similarity across adjacent training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
