PPO in the Fisher-Rao geometry

Razvan-Andrei Lascu; David \v{S}i\v{s}ka; {\L}ukasz Szpruch

arXiv:2506.03757·cs.LG·February 2, 2026

PPO in the Fisher-Rao geometry

Razvan-Andrei Lascu, David \v{S}i\v{s}ka, {\L}ukasz Szpruch

PDF

Open Access

TL;DR

This paper introduces Fisher-Rao PPO, a reinforcement learning algorithm with strong theoretical guarantees for policy improvement, leveraging Fisher-Rao geometry to achieve dimension-independent convergence and demonstrating competitive empirical performance.

Contribution

The paper derives a tighter surrogate objective for PPO and introduces Fisher-Rao PPO, providing formal guarantees and improved convergence properties in reinforcement learning.

Findings

01

Achieves sub-linear convergence independent of state/action space dimensions.

02

Provides monotonic policy improvement guarantees.

03

Performs well empirically across standard RL tasks.

Abstract

Proximal Policy Optimization (PPO) is widely used in reinforcement learning due to its strong empirical performance, yet it lacks formal guarantees for policy improvement and convergence. PPO's clipped surrogate objective is motivated by a lower bound on linearization of the value function in flat geometry setting. We derive a tighter surrogate objective and introduce Fisher-Rao PPO (FR-PPO) by leveraging the Fisher-Rao (FR) geometry. Our scheme provides strong theoretical guarantees, including monotonic policy improvement. In the direct parametrization setting, we show that FR-PPO achieves sub-linear convergence with no dependence on action or state space dimensions, and for parametrized policies we further obtain sub-linear convergence up to the compatible function approximation error. Finally, although our primary focus is theoretical, we also demonstrate empirically that FR-PPO…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research

MethodsEntropy Regularization · Proximal Policy Optimization