A Finite Sample Complexity Bound for Distributionally Robust Q-learning

Shengbo Wang; Nian Si; Jose Blanchet; and Zhengyuan Zhou

arXiv:2302.13203·cs.LG·August 2, 2024·6 cites

A Finite Sample Complexity Bound for Distributionally Robust Q-learning

Shengbo Wang, Nian Si, Jose Blanchet, and Zhengyuan Zhou

PDF

Open Access

TL;DR

This paper establishes the first finite sample complexity bounds for distributionally robust Q-learning in reinforcement learning, demonstrating how to efficiently learn robust policies under environment uncertainty with theoretical guarantees.

Contribution

It extends the robust Q-learning framework with improved estimator design and provides the first known sample complexity bounds for model-free robust reinforcement learning.

Findings

01

Sample complexity bound depends on state-action space, discount factor, and uncertainty parameters.

02

Algorithm achieves epsilon-accuracy with high probability within the derived sample complexity.

03

Simulation results support the theoretical analysis and effectiveness of the proposed method.

Abstract

We consider a reinforcement learning setting in which the deployment environment is different from the training environment. Applying a robust Markov decision processes formulation, we extend the distributionally robust $Q$ -learning framework studied in Liu et al. [2022]. Further, we improve the design and analysis of their multi-level Monte Carlo estimator. Assuming access to a simulator, we prove that the worst-case expected sample complexity of our algorithm to learn the optimal robust $Q$ -function within an $ϵ$ error in the sup norm is upper bounded by $\tilde{O} (∣ S ∣∣ A ∣ (1 - γ)^{- 5} ϵ^{- 2} p_{\land}^{- 6} δ^{- 4})$ , where $γ$ is the discount rate, $p_{\land}$ is the non-zero minimal support probability of the transition kernels and $δ$ is the uncertainty size. This is the first sample complexity result for the model-free robust RL problem. Simulation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed Sensor Networks and Detection Algorithms · Reinforcement Learning in Robotics · Statistical Methods and Inference