Entropy-Gated Selective Policy Optimization:Token-Level Gradient Allocation for Hybrid Training of Large Language Models

Yuelin Hu; Zhengxue Cheng; Wei Liu; Li Song

arXiv:2602.03309·cs.LG·February 4, 2026

Entropy-Gated Selective Policy Optimization:Token-Level Gradient Allocation for Hybrid Training of Large Language Models

Yuelin Hu, Zhengxue Cheng, Wei Liu, Li Song

PDF

Open Access

TL;DR

The paper introduces EGSPO, a token-level gradient modulation method for hybrid large language model training, improving reasoning benchmarks with minimal extra computation.

Contribution

It proposes a novel entropy gated gradient allocation mechanism that enhances exploration and knowledge retention during hybrid training of large language models.

Findings

01

Improves AIME scores by 3.8% over baseline.

02

Enhances MATH benchmark performance by 2.9%.

03

Adds only 3.4% computational overhead.

Abstract

Hybrid training methods for large language models combine supervised fine tuning (SFT) on expert demonstrations with reinforcement learning (RL) on model rollouts, typically at the sample level. We propose Entropy Gated Selective Policy Optimization (EGSPO), a three stage framework that extends sample level mixing with token level gradient modulation. Stage 1, SFT expert learning, establishes a reliable warm up policy using expert demonstrations with a pure SFT loss. Stage 2, RL rollout generation, samples trajectories from the current policy and computes per token predictive entropy. Stage 3, the EGSPO mechanism, applies entropy gated gradient allocation: a predictive entropy module routes high entropy tokens to full PPO updates to encourage exploration, and low entropy tokens to attenuated PPO updates to reduce variance and preserve knowledge. Critically, both branches incorporate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification