Bayesian learning of the optimal action-value function in a Markov decision process
Jiaqi Guo, Chon Wai Ho, Sumeetpal S. Singh

TL;DR
This paper develops a comprehensive Bayesian framework for learning the optimal action-value function in finite, undiscounted Markov decision processes, incorporating minimal assumptions, artificial noise for deterministic rewards, and an adaptive Monte Carlo inference method.
Contribution
It introduces a full Bayesian approach with a novel likelihood model, artificial noise for deterministic rewards, and an adaptive sequential Monte Carlo algorithm for inference in MDPs.
Findings
Demonstrates exploration benefits of posterior sampling in MDPs
Provides a new Bayesian perspective on action selection, generalizing Thompson sampling
Validates the framework on the Deep Sea benchmark problem
Abstract
The Markov Decision Process (MDP) is a popular framework for sequential decision-making problems, and uncertainty quantification is an essential component of it to learn optimal decision-making strategies. In particular, a Bayesian framework is used to maintain beliefs about the optimal decisions and the unknown ingredients of the model, which are also to be learned from the data, such as the rewards and state dynamics. However, many existing Bayesian approaches for learning the optimal decision-making strategy are based on unrealistic modelling assumptions and utilise approximate inference techniques. This raises doubts whether the benefits of Bayesian uncertainty quantification are fully realised or can be relied upon. We focus on infinite-horizon and undiscounted MDPs, with finite state and action spaces, and a terminal state. We provide a full Bayesian framework, from modelling to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications
MethodsFocus
