Group Orthogonalized Policy Optimization:Group Policy Optimization as Orthogonal Projection in Hilbert Space

Wang Zixian

arXiv:2602.21269·cs.LG·February 26, 2026

Group Orthogonalized Policy Optimization:Group Policy Optimization as Orthogonal Projection in Hilbert Space

Wang Zixian

PDF

Open Access

TL;DR

GOPO introduces a geometric Hilbert space approach to policy optimization, enabling exact sparsity and stable gradients, leading to improved performance and stability in large language model alignment tasks.

Contribution

This paper proposes a novel Hilbert space-based alignment algorithm, GOPO, which avoids KL divergence curvature issues and enforces sparsity through orthogonal projection, with practical finite-sample implementation.

Findings

01

Achieves competitive generalization on reasoning benchmarks.

02

Maintains stable gradient dynamics and entropy during training.

03

Outperforms clipping-based methods in plateau regimes.

Abstract

We present Group Orthogonalized Policy Optimization (GOPO), a new alignment algorithm for large language models derived from the geometry of Hilbert function spaces. Instead of optimizing on the probability simplex and inheriting the exponential curvature of Kullback-Leibler divergence, GOPO lifts alignment into the Hilbert space L2(pi_k) of square-integrable functions with respect to the reference policy. Within this space, the simplex constraint reduces to a linear orthogonality condition <v, 1> = 0, defining a codimension-one subspace H0. Minimizing distance to an unconstrained target u_star yields the work-dissipation functional J(v) = <g, v> - (mu / 2) ||v||^2, whose maximizer follows directly from the Hilbert projection theorem. Enforcing the boundary v >= -1 produces a bounded Hilbert projection that induces exact sparsity, assigning zero probability to catastrophically poor…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Machine Learning in Materials Science