Data-driven robust Markov decision processes on Borel spaces: performance guarantees via an axiomatic approach

Sivaramakrishnan Ramani

arXiv:2603.08979·math.OC·March 11, 2026

Data-driven robust Markov decision processes on Borel spaces: performance guarantees via an axiomatic approach

Sivaramakrishnan Ramani

PDF

Open Access

TL;DR

This paper develops a data-driven approach for robust Markov decision processes on Borel spaces, providing performance guarantees and convergence analysis using an axiomatic framework for ambiguity sets based on distribution distances.

Contribution

It introduces a novel axiomatic approach to construct ambiguity sets in RMDPs, establishing convergence, probabilistic bounds, and performance guarantees with finite samples.

Findings

01

Robust optimal value functions converge to true values as sample size increases.

02

Finite sample robust value bounds serve as high probability upper bounds on out-of-sample performance.

03

Several well-known distances satisfy the concentration inequalities needed for performance guarantees.

Abstract

We consider Markov decision processes (MDPs) with unknown disturbance distribution and address this problem using the robust Markov decision process (RMDP) approach. We construct the empirical distribution of the unknown disturbance distribution and characterize our ambiguity set of distributions as the sublevel set of a nonnegative distance function from the empirical distribution. By connecting the weak convergence of distributions to convergence with respect to the distance function, we prove that the robust optimal value function and the out-of-sample value function converge to the true optimal value function with increasing sample-sizes. We establish that, for finite sample-sizes, the robust optimal value function serves as a high probability upper bound on the out-of-sample value function. We also obtain probabilistic convergence rates, sample complexity bounds, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Risk and Portfolio Optimization · Distributed Sensor Networks and Detection Algorithms