Provably Efficient Algorithms for S- and Non-Rectangular Robust MDPs with General Parameterization

Anirudh Satheesh; Ziyi Chen; Furong Huang; Heng Huang

arXiv:2602.11387·cs.LG·February 13, 2026

Provably Efficient Algorithms for S- and Non-Rectangular Robust MDPs with General Parameterization

Anirudh Satheesh, Ziyi Chen, Furong Huang, Heng Huang

PDF

Open Access

TL;DR

This paper develops efficient algorithms for robust Markov decision processes with general policy parameterizations, extending theoretical guarantees and improving sample complexity bounds for both discounted and average reward settings.

Contribution

It introduces novel algorithms with provable sample complexity guarantees for RMDPs beyond tabular policies, including infinite state spaces and non-rectangular uncertainty sets.

Findings

01

Introduces a multilevel Monte Carlo gradient estimator with $ ilde{O}(rac{1}{ ext{epsilon}^2})$ complexity.

02

Provides the first sample complexity guarantees for RMDPs with general policy parameterization beyond $(s,a)$-rectangularity.

03

Achieves significant improvements in sample complexity bounds for both discounted and average reward robust MDPs.

Abstract

We study robust Markov decision processes (RMDPs) with general policy parameterization under s-rectangular and non-rectangular uncertainty sets. Prior work is largely limited to tabular policies, and hence either lacks sample complexity guarantees or incurs high computational cost. Our method reduces the average reward RMDPs to entropy-regularized discounted robust MDPs, restoring strong duality and enabling tractable equilibrium computation. We prove novel Lipschitz and Lipschitz-smoothness properties for general policy parameterizations that extends to infinite state spaces. To address infinite-horizon gradient estimation, we introduce a multilevel Monte Carlo gradient estimator with $\tilde{O} (ϵ^{- 2})$ sample complexity, a factor of $O (ϵ^{- 2})$ improvement over prior work. Building on this, we design a projected gradient descent algorithm for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques