Rethinking GSPO: The Perplexity-Entropy Equivalence

Chi Liu

arXiv:2510.23142·cs.LG·October 28, 2025

Rethinking GSPO: The Perplexity-Entropy Equivalence

Chi Liu

PDF

TL;DR

This paper reveals that GSPO's importance ratios are equivalent to inverse perplexity ratios and exponential cross-entropy changes, providing an information-theoretic perspective that explains its empirical stability and variance reduction.

Contribution

It establishes a novel connection between GSPO importance weights and information-theoretic quantities, offering new insights into its behavior and stability.

Findings

01

GSPO importance ratios are equivalent to inverse perplexity ratios.

02

Perplexity-entropy relationship explains variance reduction.

03

Empirical validation on reasoning tasks supports the theory.

Abstract

We provide a new perspective on GSPO's length-normalized importance ratios by establishing their connection to information-theoretic quantities. We show that GSPO's sequence-level weight $s (θ) = (π_{θ} / π_{θ_{old}})^{1/∣ y ∣}$ can be equivalently expressed as the inverse perplexity ratio $PPL_{θ_{old}} / PPL_{θ}$ and as the exponential cross-entropy change $exp (Δ H)$ . While the perplexity-entropy relationship follows from standard definitions, this observation provides a useful lens for understanding GSPO: the algorithm weights policy gradient updates by perplexity ratios, offering an information-theoretic interpretation of the importance weights. This perspective helps explain GSPO's empirical properties, including log-domain variance reduction through geometric averaging and stability in training mixture-of-experts models. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.