A Finite Sample Complexity Bound for Distributionally Robust Q-learning
Shengbo Wang, Nian Si, Jose Blanchet, and Zhengyuan Zhou

TL;DR
This paper establishes the first finite sample complexity bounds for distributionally robust Q-learning in reinforcement learning, demonstrating how to efficiently learn robust policies under environment uncertainty with theoretical guarantees.
Contribution
It extends the robust Q-learning framework with improved estimator design and provides the first known sample complexity bounds for model-free robust reinforcement learning.
Findings
Sample complexity bound depends on state-action space, discount factor, and uncertainty parameters.
Algorithm achieves epsilon-accuracy with high probability within the derived sample complexity.
Simulation results support the theoretical analysis and effectiveness of the proposed method.
Abstract
We consider a reinforcement learning setting in which the deployment environment is different from the training environment. Applying a robust Markov decision processes formulation, we extend the distributionally robust -learning framework studied in Liu et al. [2022]. Further, we improve the design and analysis of their multi-level Monte Carlo estimator. Assuming access to a simulator, we prove that the worst-case expected sample complexity of our algorithm to learn the optimal robust -function within an error in the sup norm is upper bounded by , where is the discount rate, is the non-zero minimal support probability of the transition kernels and is the uncertainty size. This is the first sample complexity result for the model-free robust RL problem. Simulation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed Sensor Networks and Detection Algorithms · Reinforcement Learning in Robotics · Statistical Methods and Inference
