Offline Bayesian Aleatoric and Epistemic Uncertainty Quantification and Posterior Value Optimisation in Finite-State MDPs
Filippo Valdettaro, A. Aldo Faisal

TL;DR
This paper introduces a Bayesian approach to quantify and disentangle aleatoric and epistemic uncertainties in finite-state MDPs, and proposes a method to optimize policies based on posterior expected value, demonstrated on gridworlds and the AI Clinician problem.
Contribution
It presents a novel technique for uncertainty quantification and policy optimization in offline finite-state MDPs without strong distributional assumptions.
Findings
Effective uncertainty disentanglement in simple MDPs
Successful policy optimization using closed-form value expressions
Scalable approach demonstrated on real-world ICU treatment data
Abstract
We address the challenge of quantifying Bayesian uncertainty and incorporating it in offline use cases of finite-state Markov Decision Processes (MDPs) with unknown dynamics. Our approach provides a principled method to disentangle epistemic and aleatoric uncertainty, and a novel technique to find policies that optimise Bayesian posterior expected value without relying on strong assumptions about the MDP's posterior distribution. First, we utilise standard Bayesian reinforcement learning methods to capture the posterior uncertainty in MDP parameters based on available data. We then analytically compute the first two moments of the return distribution across posterior samples and apply the law of total variance to disentangle aleatoric and epistemic uncertainties. To find policies that maximise posterior expected value, we leverage the closed-form expression for value as a function of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNumerical Methods and Algorithms · Formal Methods in Verification · Low-power high-performance VLSI design
