Optimizing Percentile Criterion Using Robust MDPs
Bahram Behzadian, Reazul Hasan Russel, Marek Petrik, Chin Pang Ho

TL;DR
This paper introduces new algorithms for optimizing the percentile criterion in reinforcement learning using Robust MDPs, focusing on minimizing ambiguity set spans to improve policy reliability with limited data.
Contribution
It proposes novel algorithms that minimize the span of ambiguity sets in Robust MDPs, enhancing the reliability of policies under uncertainty with theoretical guarantees.
Findings
Optimized ambiguity sets significantly outperform prior methods.
Algorithms effectively minimize span of ambiguity sets in weighted norms.
Methods provide Bayesian and frequentist guarantees with new concentration inequalities.
Abstract
We address the problem of computing reliable policies in reinforcement learning problems with limited data. In particular, we compute policies that achieve good returns with high confidence when deployed. This objective, known as the \emph{percentile criterion}, can be optimized using Robust MDPs~(RMDPs). RMDPs generalize MDPs to allow for uncertain transition probabilities chosen adversarially from given ambiguity sets. We show that the RMDP solution's sub-optimality depends on the spans of the ambiguity sets along the value function. We then propose new algorithms that minimize the span of ambiguity sets defined by weighted and norms. Our primary focus is on Bayesian guarantees, but we also describe how our methods apply to frequentist guarantees and derive new concentration inequalities for weighted and norms. Experimental results indicate that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Risk and Portfolio Optimization
