Proximal Policy Optimization in Path Space: A Schr\"odinger Bridge Perspective
Yuehu Gong, Zeyuan Wang, Yulin Chen, Yanwei Fu

TL;DR
This paper introduces GSB-PPO, a novel path-space reinforcement learning framework inspired by Schr"odinger Bridge theory, which improves stability and performance of generative policies compared to traditional action-space PPO.
Contribution
It extends PPO to trajectory-level generative policies using Schr"odinger Bridge concepts, providing a unified and more stable on-policy optimization method.
Findings
Penalty-based GSB-PPO outperforms clipping-based in stability and performance.
Path-space proximal regularization enhances training of generative policies.
Framework unifies trajectory-level and action-level policy optimization.
Abstract
On-policy reinforcement learning with generative policies is promising but remains underexplored. A central challenge is that proximal policy optimization (PPO) is traditionally formulated in terms of action-space probability ratios, whereas diffusion- and flow-based policies are more naturally represented as trajectory-level generative processes. In this work, we propose GSB-PPO, a path-space formulation of generative PPO inspired by the Generalized Schr\"odinger Bridge (GSB). Our framework lifts PPO-style proximal updates from terminal actions to full generation trajectories, yielding a unified view of on-policy optimization for generative policies. Within this framework, we develop two concrete objectives: a clipping-based objective, GSB-PPO-Clip, and a penalty-based objective, GSB-PPO-Penalty. Experimental results show that while both objectives are compatible with on-policy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning
