Central Path Proximal Policy Optimization

Nikola Milosevic; Johannes M\"uller; Nico Scherf

arXiv:2506.00700·cs.LG·August 18, 2025

Central Path Proximal Policy Optimization

Nikola Milosevic, Johannes M\"uller, Nico Scherf

PDF

Open Access

TL;DR

This paper introduces C3PO, a modification of PPO that incorporates central path optimization to better enforce constraints without sacrificing policy performance.

Contribution

C3PO is a new method that integrates central path ideas into PPO, improving constraint enforcement in constrained Markov decision processes.

Findings

01

C3PO achieves tighter constraint adherence than standard PPO.

02

C3PO maintains or improves final policy return.

03

C3PO demonstrates promising results in constrained environments.

Abstract

In constrained Markov decision processes, enforcing constraints during training is often thought of as decreasing the final return. Recently, it was shown that constraints can be incorporated directly into the policy geometry, yielding an optimization trajectory close to the central path of a barrier method, which does not compromise final return. Building on this idea, we introduce Central Path Proximal Policy Optimization (C3PO), a simple modification of the PPO loss that produces policy iterates, that stay close to the central path of the constrained optimization problem. Compared to existing on-policy methods, C3PO delivers improved performance with tighter constraint enforcement, suggesting that central path-guided updates offer a promising direction for constrained policy optimization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Advanced Bandit Algorithms Research