Generalized Bayesian deep reinforcement learning

Shreya Sinha Roy; Richard G. Everitt; Christian P. Robert; Ritabrata Dutta

arXiv:2412.11743·stat.ML·June 3, 2025

Generalized Bayesian deep reinforcement learning

Shreya Sinha Roy, Richard G. Everitt, Christian P. Robert, Ritabrata Dutta

PDF

Open Access

TL;DR

This paper introduces a generalized Bayesian deep reinforcement learning framework that models environment dynamics with deep generative models, uses a novel scoring rule for posterior inference, and proposes an improved policy learning method called expected Thompson sampling.

Contribution

It develops a new Bayesian inference approach for deep generative models in RL and introduces expected Thompson sampling for better policy optimization.

Findings

01

The proposed method outperforms traditional Thompson sampling in simulations.

02

Theoretical justification via Bernstein-von Mises theorem supports the approach.

03

Extended to continuous action spaces with promising results.

Abstract

Bayesian reinforcement learning (BRL) is a method that merges principles from Bayesian statistics and reinforcement learning to make optimal decisions in uncertain environments. As a model-based RL method, it has two key components: (1) inferring the posterior distribution of the model for the data-generating process (DGP) and (2) policy learning using the learned posterior. We propose to model the dynamics of the unknown environment through deep generative models, assuming Markov dependence. In the absence of likelihood functions for these models, we train them by learning a generalized predictive-sequential (or prequential) scoring rule (SR) posterior. We used sequential Monte Carlo (SMC) samplers to draw samples from this generalized Bayesian posterior distribution. In conjunction, to achieve scalability in the high-dimensional parameter space of the neural networks, we use the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics