Sparse-RL: Breaking the Memory Wall in LLM Reinforcement Learning via Stable Sparse Rollouts

Sijia Luo; Xiaokang Zhang; Yuxuan Hu; Bohan Zhang; Ke Wang; Jinbo Su; Mengshu Sun; Lei Liang; Jing Zhang

arXiv:2601.10079·cs.LG·March 31, 2026

Sparse-RL: Breaking the Memory Wall in LLM Reinforcement Learning via Stable Sparse Rollouts

Sijia Luo, Xiaokang Zhang, Yuxuan Hu, Bohan Zhang, Ke Wang, Jinbo Su, Mengshu Sun, Lei Liang, Jing Zhang

PDF

1 Repo

TL;DR

Sparse-RL introduces a stable reinforcement learning method for large language models that reduces memory overhead with sparse rollouts, maintaining performance and robustness.

Contribution

It presents a novel approach combining sparsity-aware rejection sampling and reweighting to stabilize RL training with compressed key-value caches.

Findings

01

Reduces rollout memory overhead compared to dense methods

02

Maintains performance despite compression-induced information loss

03

Enhances model robustness during sparse inference

Abstract

Reinforcement Learning (RL) has become essential for eliciting complex reasoning capabilities in Large Language Models (LLMs). However, the substantial memory overhead of storing Key-Value (KV) caches during long-horizon rollouts acts as a critical bottleneck, often prohibiting efficient training on limited hardware. While existing KV compression techniques offer a remedy for inference, directly applying them to RL training induces a severe policy mismatch, leading to catastrophic performance collapse. To address this, we introduce Sparse-RL empowers stable RL training under sparse rollouts. We show that instability arises from a fundamental policy mismatch among the dense old policy, the sparse sampler policy, and the learner policy. To mitigate this issue, Sparse-RL incorporates Sparsity-Aware Rejection Sampling and Importance-based Reweighting to correct the off-policy bias…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.