Infinite Sampling: Efficient and Stable Grouped RL Training for Large Language Models
Liangyu Wang, Huanyi Xie, Xinhai Wang, Tianjin Huang, Mengdi Li, Di Wang

TL;DR
This paper introduces Infinite Sampling, a framework that enables large group reinforcement learning training for large language models by reducing memory overhead and increasing efficiency through micro sampling, continuous sampling, and a length-aware scheduler.
Contribution
It proposes a novel Infinite Sampling framework that decouples group size from memory constraints, improving scalability and stability in group-based RL training for LLMs.
Findings
Reduces peak memory usage by over 50%.
Improves throughput by over 25%.
Maintains full-length completions with larger groups.
Abstract
Group-based reinforcement learning algorithms such as Group Reward Policy Optimization (GRPO) have proven effective for fine-tuning large language models (LLMs) with human feedback. However, generating and storing multiple responses per prompt incurs substantial memory overhead, especially as the sample group size increases, limiting scalability under constrained hardware. We propose Infinite Sampling, a framework that enables efficient and stable GRPO training by decoupling group size from GPU memory usage. It consists of: (1) micro sampling groups that decompose large groups into memory-feasible rounds; (2) continuous sampling that interleaves generation across groups to improve utilization; and (3) a length-aware scheduler combining token-conditioned sequence length prediction with a two-stage plan: global grouping via FPTAS and runtime refill via SJF. Experiments show that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Reinforcement Learning in Robotics
