Learning-Zone Energy: Online Data Selection for Efficient RL Post-Training

Peng Cui; Boyao Yang; Jun Zhu

arXiv:2605.17003·cs.LG·May 20, 2026

Learning-Zone Energy: Online Data Selection for Efficient RL Post-Training

Peng Cui, Boyao Yang, Jun Zhu

PDF

1 Repo

TL;DR

Learning-Zone Energy (LZE) is an online data selection method for reinforcement learning in large language models that efficiently focuses training on the most informative prompts, reducing compute while maintaining or improving performance.

Contribution

LZE introduces a theoretically grounded, fully online scoring framework that dynamically concentrates training on the model's active learning frontier, improving efficiency over uniform sampling methods.

Findings

01

Retains only 40% of training data per step while matching or surpassing full-data baselines.

02

Achieves +45.9% out-of-distribution gains on AIME25 and +18.2% on AMC23.

03

Reduces training FLOPs by an estimated 36%.

Abstract

Reinforcement Learning (RL) post-training has emerged as the dominant paradigm for eliciting mathematical reasoning in Large Language Models (LLMs), yet prevailing techniques such as GRPO and DAPO distribute rollout and gradient budgets nearly uniformly across prompts, squandering compute on samples that are already mastered or remain far beyond the model's current capability. To address this fundamental inefficiency, we propose Learning-Zone Energy (LZE), a theoretically grounded, fully online data selection framework that concentrates computation on the model's active learning frontier. At its core, we define a closed-form Learning-Zone Energy Score that fuses three complementary signals, an initial-difficulty anchor, a normalized outcome-uncertainty term, and a pass-rate momentum, into a single scalar that is provably aligned with the expected magnitude of group-relative policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Stellaris167/LZE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.