Infinite Sampling: Efficient and Stable Grouped RL Training for Large Language Models

Liangyu Wang; Huanyi Xie; Xinhai Wang; Tianjin Huang; Mengdi Li; Di Wang

arXiv:2506.22950·cs.LG·July 1, 2025

Infinite Sampling: Efficient and Stable Grouped RL Training for Large Language Models

Liangyu Wang, Huanyi Xie, Xinhai Wang, Tianjin Huang, Mengdi Li, Di Wang

PDF

Open Access

TL;DR

This paper introduces Infinite Sampling, a framework that enables large group reinforcement learning training for large language models by reducing memory overhead and increasing efficiency through micro sampling, continuous sampling, and a length-aware scheduler.

Contribution

It proposes a novel Infinite Sampling framework that decouples group size from memory constraints, improving scalability and stability in group-based RL training for LLMs.

Findings

01

Reduces peak memory usage by over 50%.

02

Improves throughput by over 25%.

03

Maintains full-length completions with larger groups.

Abstract

Group-based reinforcement learning algorithms such as Group Reward Policy Optimization (GRPO) have proven effective for fine-tuning large language models (LLMs) with human feedback. However, generating and storing multiple responses per prompt incurs substantial memory overhead, especially as the sample group size increases, limiting scalability under constrained hardware. We propose Infinite Sampling, a framework that enables efficient and stable GRPO training by decoupling group size from GPU memory usage. It consists of: (1) micro sampling groups that decompose large groups into memory-feasible rounds; (2) continuous sampling that interleaves generation across groups to improve utilization; and (3) a length-aware scheduler combining token-conditioned sequence length prediction with a two-stage plan: global grouping via FPTAS and runtime refill via SJF. Experiments show that our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Reinforcement Learning in Robotics