Provably Efficient Algorithms for S- and Non-Rectangular Robust MDPs with General Parameterization
Anirudh Satheesh, Ziyi Chen, Furong Huang, Heng Huang

TL;DR
This paper develops efficient algorithms for robust Markov decision processes with general policy parameterizations, extending theoretical guarantees and improving sample complexity bounds for both discounted and average reward settings.
Contribution
It introduces novel algorithms with provable sample complexity guarantees for RMDPs beyond tabular policies, including infinite state spaces and non-rectangular uncertainty sets.
Findings
Introduces a multilevel Monte Carlo gradient estimator with $ ilde{O}(rac{1}{ ext{epsilon}^2})$ complexity.
Provides the first sample complexity guarantees for RMDPs with general policy parameterization beyond $(s,a)$-rectangularity.
Achieves significant improvements in sample complexity bounds for both discounted and average reward robust MDPs.
Abstract
We study robust Markov decision processes (RMDPs) with general policy parameterization under s-rectangular and non-rectangular uncertainty sets. Prior work is largely limited to tabular policies, and hence either lacks sample complexity guarantees or incurs high computational cost. Our method reduces the average reward RMDPs to entropy-regularized discounted robust MDPs, restoring strong duality and enabling tractable equilibrium computation. We prove novel Lipschitz and Lipschitz-smoothness properties for general policy parameterizations that extends to infinite state spaces. To address infinite-horizon gradient estimation, we introduce a multilevel Monte Carlo gradient estimator with sample complexity, a factor of improvement over prior work. Building on this, we design a projected gradient descent algorithm for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques
