EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control

Kai Yang; Xin Xu; Yangkun Chen; Weijie Liu; Jiafei Lyu; Zichuan Lin; Deheng Ye; Saiyong Yang

arXiv:2511.15248·cs.LG·February 3, 2026

EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control

Kai Yang, Xin Xu, Yangkun Chen, Weijie Liu, Jiafei Lyu, Zichuan Lin, Deheng Ye, Saiyong Yang

PDF

Open Access 2 Models

TL;DR

EntroPIC introduces an adaptive entropy stabilization method using proportional-integral control to enhance the stability and exploration efficiency of long-term large language model training.

Contribution

The paper presents a novel entropy stabilization technique with theoretical analysis and practical validation for large-scale LLM training.

Findings

01

Successfully maintains target entropy levels during training

02

Enables stable and efficient exploration in LLMs

03

Improves training stability over existing methods

Abstract

Long-term training of large language models (LLMs) requires maintaining stable exploration to prevent the model from collapsing into sub-optimal behaviors. Entropy is crucial in this context, as it controls exploration and helps avoid premature convergence to sub-optimal solutions. However, existing reinforcement learning methods struggle to maintain an appropriate level of entropy, as the training process involves a mix of positive and negative samples, each affecting entropy in different ways across steps. To address this, we propose Entropy stabilization via Proportional-Integral Control (EntroPIC), a novel method that adaptively adjusts the influence of positive and negative samples by dynamically tuning their loss coefficients. This approach stabilizes entropy throughout training, ensuring efficient exploration and steady progress. We provide a comprehensive theoretical analysis…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Topic Modeling