Towards Minimax Optimality of Model-based Robust Reinforcement Learning
Pierre Clavier, Erwan Le Pennec, Matthieu Geist

TL;DR
This paper establishes near-optimal sample complexity bounds for model-based robust reinforcement learning in discounted MDPs with $L_p$-ball uncertainty sets, improving previous bounds and matching non-robust lower bounds under certain conditions.
Contribution
It provides the first minimax optimal sample complexity bounds for robust RL with $L_p$-ball uncertainty sets, matching non-robust bounds when the uncertainty is small.
Findings
Sample complexity of $ ilde{O}(H^4 |S||A| / \\epsilon^2)$ for general case.
Improved sample complexity to $ ilde{O}(H^3 |S||A| / \\epsilon^2)$ when uncertainty is small.
Results recover non-robust lower bounds and establish robust lower bounds under specific conditions.
Abstract
We study the sample complexity of obtaining an -optimal policy in \emph{Robust} discounted Markov Decision Processes (RMDPs), given only access to a generative model of the nominal kernel. This problem is widely studied in the non-robust case, and it is known that any planning approach applied to an empirical MDP estimated with samples provides an -optimal policy, which is minimax optimal. Results in the robust case are much more scarce. For - (resp -)rectangular uncertainty sets, the best known sample complexity is (resp. ), for specific algorithms and when the uncertainty set is based on the total variation (TV), the KL or the Chi-square divergences.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Adversarial Robustness in Machine Learning
