Improved Sample Complexity Bounds for Distributionally Robust Reinforcement Learning
Zaiyan Xu, Kishan Panaganti, Dileep Kalathil

TL;DR
This paper introduces a new algorithm for distributionally robust reinforcement learning that significantly improves sample complexity bounds across various divergence-based uncertainty sets, including the first for Wasserstein.
Contribution
The paper proposes the Robust Phased Value Learning algorithm with improved sample complexity bounds for multiple divergence measures, including the first analysis for Wasserstein uncertainty sets.
Findings
Achieves $ ilde{O}(|S||A| H^{5})$ sample complexity, better by a factor of $|S|$.
Provides the first sample complexity result for Wasserstein uncertainty sets.
Demonstrates effectiveness through simulation experiments.
Abstract
We consider the problem of learning a control policy that is robust against the parameter mismatches between the training environment and testing environment. We formulate this as a distributionally robust reinforcement learning (DR-RL) problem where the objective is to learn the policy which maximizes the value function against the worst possible stochastic model of the environment in an uncertainty set. We focus on the tabular episodic learning setting where the algorithm has access to a generative model of the nominal (training) environment around which the uncertainty set is defined. We propose the Robust Phased Value Learning (RPVL) algorithm to solve this problem for the uncertainty sets specified by four different divergences: total variation, chi-square, Kullback-Leibler, and Wasserstein. We show that our algorithm achieves …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
