Optimizing Percentile Criterion Using Robust MDPs

Bahram Behzadian; Reazul Hasan Russel; Marek Petrik; Chin Pang Ho

arXiv:1910.10786·cs.LG·March 1, 2021·5 cites

Optimizing Percentile Criterion Using Robust MDPs

Bahram Behzadian, Reazul Hasan Russel, Marek Petrik, Chin Pang Ho

PDF

Open Access

TL;DR

This paper introduces new algorithms for optimizing the percentile criterion in reinforcement learning using Robust MDPs, focusing on minimizing ambiguity set spans to improve policy reliability with limited data.

Contribution

It proposes novel algorithms that minimize the span of ambiguity sets in Robust MDPs, enhancing the reliability of policies under uncertainty with theoretical guarantees.

Findings

01

Optimized ambiguity sets significantly outperform prior methods.

02

Algorithms effectively minimize span of ambiguity sets in weighted norms.

03

Methods provide Bayesian and frequentist guarantees with new concentration inequalities.

Abstract

We address the problem of computing reliable policies in reinforcement learning problems with limited data. In particular, we compute policies that achieve good returns with high confidence when deployed. This objective, known as the \emph{percentile criterion}, can be optimized using Robust MDPs~(RMDPs). RMDPs generalize MDPs to allow for uncertain transition probabilities chosen adversarially from given ambiguity sets. We show that the RMDP solution's sub-optimality depends on the spans of the ambiguity sets along the value function. We then propose new algorithms that minimize the span of ambiguity sets defined by weighted $L_{1}$ and $L_{\infty}$ norms. Our primary focus is on Bayesian guarantees, but we also describe how our methods apply to frequentist guarantees and derive new concentration inequalities for weighted $L_{1}$ and $L_{\infty}$ norms. Experimental results indicate that our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Risk and Portfolio Optimization