Rectified Robust Policy Optimization for Model-Uncertain Constrained Reinforcement Learning without Strong Duality

Shaocong Ma; Ziyi Chen; Yi Zhou; Heng Huang

arXiv:2508.17448·cs.LG·September 23, 2025

Rectified Robust Policy Optimization for Model-Uncertain Constrained Reinforcement Learning without Strong Duality

Shaocong Ma, Ziyi Chen, Yi Zhou, Heng Huang

PDF

Open Access

TL;DR

This paper introduces RRPO, a primal-only algorithm for robust constrained reinforcement learning that guarantees convergence to near-optimal policies without relying on strong duality, validated through theoretical analysis and grid-world experiments.

Contribution

Proposes RRPO, a novel primal-only method for robust constrained RL that overcomes duality limitations and provides convergence guarantees.

Findings

01

RRPO converges to approximately optimal feasible policies.

02

RRPO outperforms non-robust methods in safety under model uncertainty.

03

Empirical results validate the effectiveness of RRPO in grid-world environments.

Abstract

The goal of robust constrained reinforcement learning (RL) is to optimize an agent's performance under the worst-case model uncertainty while satisfying safety or resource constraints. In this paper, we demonstrate that strong duality does not generally hold in robust constrained RL, indicating that traditional primal-dual methods may fail to find optimal feasible policies. To overcome this limitation, we propose a novel primal-only algorithm called Rectified Robust Policy Optimization (RRPO), which operates directly on the primal problem without relying on dual formulations. We provide theoretical convergence guarantees under mild regularity assumptions, showing convergence to an approximately optimal feasible policy with iteration complexity matching the best-known lower bound when the uncertainty set diameter is controlled in a specific level. Empirical results in a grid-world…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics