Near-Optimal Sample Complexities of Divergence-based S-rectangular Distributionally Robust Reinforcement Learning
Zhenghao Li, Shengbo Wang, Nian Si

TL;DR
This paper establishes near-optimal sample complexity bounds for divergence-based S-rectangular distributionally robust reinforcement learning, highlighting improved theoretical understanding and practical performance in robust RL models.
Contribution
It provides the first near-optimal sample complexity results for divergence-based S-rectangular DR-RL, matching optimal dependence on key parameters.
Findings
Sample complexity bounds of $ ilde{O}(|S||A|(1-gamma)^{-4}varepsilon^{-2})$ established.
Numerical experiments confirm fast learning performance of the proposed algorithm.
Results demonstrate the effectiveness of S-rectangular models in robust RL applications.
Abstract
Distributionally robust reinforcement learning (DR-RL) has recently gained significant attention as a principled approach that addresses discrepancies between training and testing environments. To balance robustness, conservatism, and computational traceability, the literature has introduced DR-RL models with SA-rectangular and S-rectangular adversaries. While most existing statistical analyses focus on SA-rectangular models, owing to their algorithmic simplicity and the optimality of deterministic policies, S-rectangular models more accurately capture distributional discrepancies in many real-world applications and often yield more effective robust randomized policies. In this paper, we study the empirical value iteration algorithm for divergence-based S-rectangular DR-RL and establish near-optimal sample complexity bounds of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
