Rethinking GSPO: The Perplexity-Entropy Equivalence
Chi Liu

TL;DR
This paper reveals that GSPO's importance ratios are equivalent to inverse perplexity ratios and exponential cross-entropy changes, providing an information-theoretic perspective that explains its empirical stability and variance reduction.
Contribution
It establishes a novel connection between GSPO importance weights and information-theoretic quantities, offering new insights into its behavior and stability.
Findings
GSPO importance ratios are equivalent to inverse perplexity ratios.
Perplexity-entropy relationship explains variance reduction.
Empirical validation on reasoning tasks supports the theory.
Abstract
We provide a new perspective on GSPO's length-normalized importance ratios by establishing their connection to information-theoretic quantities. We show that GSPO's sequence-level weight can be equivalently expressed as the inverse perplexity ratio and as the exponential cross-entropy change . While the perplexity-entropy relationship follows from standard definitions, this observation provides a useful lens for understanding GSPO: the algorithm weights policy gradient updates by perplexity ratios, offering an information-theoretic interpretation of the importance weights. This perspective helps explain GSPO's empirical properties, including log-domain variance reduction through geometric averaging and stability in training mixture-of-experts models. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
