Towards Minimax Optimality of Model-based Robust Reinforcement Learning

Pierre Clavier; Erwan Le Pennec; Matthieu Geist

arXiv:2302.05372·cs.LG·June 7, 2024

Towards Minimax Optimality of Model-based Robust Reinforcement Learning

Pierre Clavier, Erwan Le Pennec, Matthieu Geist

PDF

Open Access

TL;DR

This paper establishes near-optimal sample complexity bounds for model-based robust reinforcement learning in discounted MDPs with $L_p$-ball uncertainty sets, improving previous bounds and matching non-robust lower bounds under certain conditions.

Contribution

It provides the first minimax optimal sample complexity bounds for robust RL with $L_p$-ball uncertainty sets, matching non-robust bounds when the uncertainty is small.

Findings

01

Sample complexity of $ ilde{O}(H^4 |S||A| / \\epsilon^2)$ for general case.

02

Improved sample complexity to $ ilde{O}(H^3 |S||A| / \\epsilon^2)$ when uncertainty is small.

03

Results recover non-robust lower bounds and establish robust lower bounds under specific conditions.

Abstract

We study the sample complexity of obtaining an $ϵ$ -optimal policy in \emph{Robust} discounted Markov Decision Processes (RMDPs), given only access to a generative model of the nominal kernel. This problem is widely studied in the non-robust case, and it is known that any planning approach applied to an empirical MDP estimated with $\tilde{O} (\frac{H ^{3} ∣ S ∣∣ A ∣}{ϵ ^{2}})$ samples provides an $ϵ$ -optimal policy, which is minimax optimal. Results in the robust case are much more scarce. For $s a$ - (resp $s$ -)rectangular uncertainty sets, the best known sample complexity is $\tilde{O} (\frac{H ^{4} ∣ S ∣ ^{2} ∣ A ∣}{ϵ ^{2}})$ (resp. $\tilde{O} (\frac{H ^{4} ∣ S ∣ ^{2} ∣ A ∣ ^{2}}{ϵ ^{2}})$ ), for specific algorithms and when the uncertainty set is based on the total variation (TV), the KL or the Chi-square divergences.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Adversarial Robustness in Machine Learning