Addressing Performance Saturation for LLM RL via Precise Entropy Curve Control

Bolian Li; Yifan Wang; Yi Ding; Anamika Lochab; Ananth Grama; Ruqi Zhang

arXiv:2604.26326·cs.LG·May 12, 2026

Addressing Performance Saturation for LLM RL via Precise Entropy Curve Control

Bolian Li, Yifan Wang, Yi Ding, Anamika Lochab, Ananth Grama, Ruqi Zhang

PDF

1 Repo

TL;DR

This paper introduces Entrocraft, a rejection-sampling method that controls entropy in RL for LLMs, preventing performance saturation and improving generalization and diversity.

Contribution

Entrocraft provides a simple, regularization-free approach to precisely schedule entropy, enabling sustained RL training improvements in large language models.

Findings

01

Entrocraft outperforms baseline models in generalization and diversity.

02

Linear entropy annealing yields the best performance.

03

Model performance is sustained longer before plateauing.

Abstract

Reinforcement learning (RL) has enabled complex reasoning abilities in large language models (LLMs). However, most RL algorithms suffer from performance saturation, preventing continued gains as RL training scales. This problem can be characterized by the collapse of entropy, a key diagnostic for exploration in RL. Existing attempts focus on preventing entropy collapse through regularization or clipping. However, their resulting entropy curves often exhibit instability in the long term, which hinders performance gains. In this paper, we introduce Entrocraft, a simple rejection-sampling approach that realizes user-customized entropy schedule by biasing the advantage distributions. Entrocraft requires no objective regularization and is advantage-estimator-agnostic. Theoretically, we relate per-step entropy change to the advantage distribution under minimal assumptions. This explains the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lblaoke/entrocraft
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.