Proximal Policy Optimization in Path Space: A Schr\"odinger Bridge Perspective

Yuehu Gong; Zeyuan Wang; Yulin Chen; Yanwei Fu

arXiv:2603.21621·cs.LG·March 24, 2026

Proximal Policy Optimization in Path Space: A Schr\"odinger Bridge Perspective

Yuehu Gong, Zeyuan Wang, Yulin Chen, Yanwei Fu

PDF

Open Access

TL;DR

This paper introduces GSB-PPO, a novel path-space reinforcement learning framework inspired by Schr"odinger Bridge theory, which improves stability and performance of generative policies compared to traditional action-space PPO.

Contribution

It extends PPO to trajectory-level generative policies using Schr"odinger Bridge concepts, providing a unified and more stable on-policy optimization method.

Findings

01

Penalty-based GSB-PPO outperforms clipping-based in stability and performance.

02

Path-space proximal regularization enhances training of generative policies.

03

Framework unifies trajectory-level and action-level policy optimization.

Abstract

On-policy reinforcement learning with generative policies is promising but remains underexplored. A central challenge is that proximal policy optimization (PPO) is traditionally formulated in terms of action-space probability ratios, whereas diffusion- and flow-based policies are more naturally represented as trajectory-level generative processes. In this work, we propose GSB-PPO, a path-space formulation of generative PPO inspired by the Generalized Schr\"odinger Bridge (GSB). Our framework lifts PPO-style proximal updates from terminal actions to full generation trajectories, yielding a unified view of on-policy optimization for generative policies. Within this framework, we develop two concrete objectives: a clipping-based objective, GSB-PPO-Clip, and a penalty-based objective, GSB-PPO-Penalty. Experimental results show that while both objectives are compatible with on-policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning