Best-Effort Policies for Robust Markov Decision Processes
Alessandro Abate, Thom Badings, Giuseppe De Giacomo, Francesco Fabiano

TL;DR
This paper introduces the concept of optimal robust best-effort (ORBE) policies for robust MDPs, which aim to maximize expected returns under both adversarial and non-adversarial transition probabilities, providing a new policy selection criterion.
Contribution
The paper proposes ORBE policies for robust MDPs, proves their existence, characterizes their structure, and develops an algorithm to compute them efficiently, improving policy selection.
Findings
ORBE policies always exist in robust MDPs.
The structure of ORBE policies is characterized in the paper.
An efficient algorithm for computing ORBE policies is presented.
Abstract
We study the common generalization of Markov decision processes (MDPs) with sets of transition probabilities, known as robust MDPs (RMDPs). A standard goal in RMDPs is to compute a policy that maximizes the expected return under an adversarial choice of the transition probabilities. If the uncertainty in the probabilities is independent between the states, known as s-rectangularity, such optimal robust policies can be computed efficiently using robust value iteration. However, there might still be multiple optimal robust policies, which, while equivalent with respect to the worst-case, reflect different expected returns under non-adversarial choices of the transition probabilities. Hence, we propose a refined policy selection criterion for RMDPs, drawing inspiration from the notions of dominance and best-effort in game theory. Instead of seeking a policy that only maximizes the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Risk and Portfolio Optimization · Advanced Bandit Algorithms Research
