Group Orthogonalized Policy Optimization:Group Policy Optimization as Orthogonal Projection in Hilbert Space
Wang Zixian

TL;DR
GOPO introduces a geometric Hilbert space approach to policy optimization, enabling exact sparsity and stable gradients, leading to improved performance and stability in large language model alignment tasks.
Contribution
This paper proposes a novel Hilbert space-based alignment algorithm, GOPO, which avoids KL divergence curvature issues and enforces sparsity through orthogonal projection, with practical finite-sample implementation.
Findings
Achieves competitive generalization on reasoning benchmarks.
Maintains stable gradient dynamics and entropy during training.
Outperforms clipping-based methods in plateau regimes.
Abstract
We present Group Orthogonalized Policy Optimization (GOPO), a new alignment algorithm for large language models derived from the geometry of Hilbert function spaces. Instead of optimizing on the probability simplex and inheriting the exponential curvature of Kullback-Leibler divergence, GOPO lifts alignment into the Hilbert space L2(pi_k) of square-integrable functions with respect to the reference policy. Within this space, the simplex constraint reduces to a linear orthogonality condition <v, 1> = 0, defining a codimension-one subspace H0. Minimizing distance to an unconstrained target u_star yields the work-dissipation functional J(v) = <g, v> - (mu / 2) ||v||^2, whose maximizer follows directly from the Hilbert projection theorem. Enforcing the boundary v >= -1 produces a bounded Hilbert projection that induces exact sparsity, assigning zero probability to catastrophically poor…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Machine Learning in Materials Science
