An Offline Risk-aware Policy Selection Method for Bayesian Markov Decision Processes
Giorgio Angelotti, Nicolas Drougard, Caroline Ponzoni Carvalho Chanel

TL;DR
This paper introduces EvC, a Bayesian-based offline policy selection method that balances risk and robustness, effectively choosing policies that perform reliably in real-world applications despite limited data.
Contribution
The paper proposes a novel risk-aware policy selection framework, EvC, that incorporates Bayesian uncertainty to improve robustness in offline MDP planning and reinforcement learning.
Findings
EvC effectively selects robust policies in simple discrete environments.
EvC outperforms state-of-the-art approaches in terms of robustness.
The method is suitable for offline applications prioritizing safety and reliability.
Abstract
In Offline Model Learning for Planning and in Offline Reinforcement Learning, the limited data set hinders the estimate of the Value function of the relative Markov Decision Process (MDP). Consequently, the performance of the obtained policy in the real world is bounded and possibly risky, especially when the deployment of a wrong policy can lead to catastrophic consequences. For this reason, several pathways are being followed with the scope of reducing the model error (or the distributional shift between the learned model and the true one) and, more broadly, obtaining risk-aware solutions with respect to model uncertainty. But when it comes to the final application which baseline should a practitioner choose? In an offline context where computational time is not an issue and robustness is the priority we propose Exploitation vs Caution (EvC), a paradigm that (1) elegantly incorporates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Bayesian Modeling and Causal Inference
