Data-driven robust Markov decision processes on Borel spaces: performance guarantees via an axiomatic approach
Sivaramakrishnan Ramani

TL;DR
This paper develops a data-driven approach for robust Markov decision processes on Borel spaces, providing performance guarantees and convergence analysis using an axiomatic framework for ambiguity sets based on distribution distances.
Contribution
It introduces a novel axiomatic approach to construct ambiguity sets in RMDPs, establishing convergence, probabilistic bounds, and performance guarantees with finite samples.
Findings
Robust optimal value functions converge to true values as sample size increases.
Finite sample robust value bounds serve as high probability upper bounds on out-of-sample performance.
Several well-known distances satisfy the concentration inequalities needed for performance guarantees.
Abstract
We consider Markov decision processes (MDPs) with unknown disturbance distribution and address this problem using the robust Markov decision process (RMDP) approach. We construct the empirical distribution of the unknown disturbance distribution and characterize our ambiguity set of distributions as the sublevel set of a nonnegative distance function from the empirical distribution. By connecting the weak convergence of distributions to convergence with respect to the distance function, we prove that the robust optimal value function and the out-of-sample value function converge to the true optimal value function with increasing sample-sizes. We establish that, for finite sample-sizes, the robust optimal value function serves as a high probability upper bound on the out-of-sample value function. We also obtain probabilistic convergence rates, sample complexity bounds, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Risk and Portfolio Optimization · Distributed Sensor Networks and Detection Algorithms
