Bayesian learning of the optimal action-value function in a Markov   decision process

Jiaqi Guo; Chon Wai Ho; Sumeetpal S. Singh

arXiv:2505.01859·stat.ML·May 6, 2025

Bayesian learning of the optimal action-value function in a Markov decision process

Jiaqi Guo, Chon Wai Ho, Sumeetpal S. Singh

PDF

Open Access

TL;DR

This paper develops a comprehensive Bayesian framework for learning the optimal action-value function in finite, undiscounted Markov decision processes, incorporating minimal assumptions, artificial noise for deterministic rewards, and an adaptive Monte Carlo inference method.

Contribution

It introduces a full Bayesian approach with a novel likelihood model, artificial noise for deterministic rewards, and an adaptive sequential Monte Carlo algorithm for inference in MDPs.

Findings

01

Demonstrates exploration benefits of posterior sampling in MDPs

02

Provides a new Bayesian perspective on action selection, generalizing Thompson sampling

03

Validates the framework on the Deep Sea benchmark problem

Abstract

The Markov Decision Process (MDP) is a popular framework for sequential decision-making problems, and uncertainty quantification is an essential component of it to learn optimal decision-making strategies. In particular, a Bayesian framework is used to maintain beliefs about the optimal decisions and the unknown ingredients of the model, which are also to be learned from the data, such as the rewards and state dynamics. However, many existing Bayesian approaches for learning the optimal decision-making strategy are based on unrealistic modelling assumptions and utilise approximate inference techniques. This raises doubts whether the benefits of Bayesian uncertainty quantification are fully realised or can be relied upon. We focus on infinite-horizon and undiscounted MDPs, with finite state and action spaces, and a terminal state. We provide a full Bayesian framework, from modelling to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSimulation Techniques and Applications

MethodsFocus