Model-Free Robust Reinforcement Learning with Sample Complexity Analysis
Yudan Wang, Shaofeng Zou, Yue Wang

TL;DR
This paper introduces a novel model-free distributionally robust reinforcement learning algorithm using Multi-level Monte Carlo, achieving finite sample complexity guarantees across multiple divergence-based uncertainty sets, advancing practical robustness.
Contribution
It presents the first model-free DR-RL algorithms with finite sample guarantees for total variation and Chi-square divergence, and improves sample complexity for KL divergence, broadening applicability.
Findings
First model-free DR-RL with finite sample guarantees for total variation and Chi-square divergence.
Improved sample complexity for KL divergence-based DR-RL.
Achieves the tightest complexity bounds for all three uncertainty models.
Abstract
Distributionally Robust Reinforcement Learning (DR-RL) aims to derive a policy optimizing the worst-case performance within a predefined uncertainty set. Despite extensive research, previous DR-RL algorithms have predominantly favored model-based approaches, with limited availability of model-free methods offering convergence guarantees or sample complexities. This paper proposes a model-free DR-RL algorithm leveraging the Multi-level Monte Carlo (MLMC) technique to close such a gap. Our innovative approach integrates a threshold mechanism that ensures finite sample requirements for algorithmic implementation, a significant improvement than previous model-free algorithms. We develop algorithms for uncertainty sets defined by total variation, Chi-square divergence, and KL divergence, and provide finite sample analyses under all three cases. Remarkably, our algorithms represent the first…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
