Provably Efficient Sample Complexity for Robust CMDP

Sourav Ganguly; Arnob Ghosh

arXiv:2511.07486·cs.LG·November 12, 2025

Provably Efficient Sample Complexity for Robust CMDP

Sourav Ganguly, Arnob Ghosh

PDF

Open Access

TL;DR

This paper introduces a new sample complexity guarantee for robust constrained Markov decision processes (RCMDPs), proposing an augmented state space and a novel value iteration algorithm that ensures near-optimal policies with safety constraints.

Contribution

The paper presents the first sample complexity guarantee for RCMDPs, introducing an augmented state space and a robust constrained value iteration algorithm.

Findings

01

Achieves a sample complexity of |S||A|H^5/psilon^2 with psilon violation.

02

Demonstrates the effectiveness of the proposed algorithm through empirical validation.

03

Highlights that Markovian policies may be suboptimal in RCMDPs under certain uncertainty sets.

Abstract

We study the problem of learning policies that maximize cumulative reward while satisfying safety constraints, even when the real environment differs from a simulator or nominal model. We focus on robust constrained Markov decision processes (RCMDPs), where the agent must maximize reward while ensuring cumulative utility exceeds a threshold under the worst-case dynamics within an uncertainty set. While recent works have established finite-time iteration complexity guarantees for RCMDPs using policy optimization, their sample complexity guarantees remain largely unexplored. In this paper, we first show that Markovian policies may fail to be optimal even under rectangular uncertainty sets unlike the {\em unconstrained} robust MDP. To address this, we introduce an augmented state space that incorporates the remaining utility budget into the state representation. Building on this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research