Generalized Bayesian deep reinforcement learning
Shreya Sinha Roy, Richard G. Everitt, Christian P. Robert, Ritabrata Dutta

TL;DR
This paper introduces a generalized Bayesian deep reinforcement learning framework that models environment dynamics with deep generative models, uses a novel scoring rule for posterior inference, and proposes an improved policy learning method called expected Thompson sampling.
Contribution
It develops a new Bayesian inference approach for deep generative models in RL and introduces expected Thompson sampling for better policy optimization.
Findings
The proposed method outperforms traditional Thompson sampling in simulations.
Theoretical justification via Bernstein-von Mises theorem supports the approach.
Extended to continuous action spaces with promising results.
Abstract
Bayesian reinforcement learning (BRL) is a method that merges principles from Bayesian statistics and reinforcement learning to make optimal decisions in uncertain environments. As a model-based RL method, it has two key components: (1) inferring the posterior distribution of the model for the data-generating process (DGP) and (2) policy learning using the learned posterior. We propose to model the dynamics of the unknown environment through deep generative models, assuming Markov dependence. In the absence of likelihood functions for these models, we train them by learning a generalized predictive-sequential (or prequential) scoring rule (SR) posterior. We used sequential Monte Carlo (SMC) samplers to draw samples from this generalized Bayesian posterior distribution. In conjunction, to achieve scalability in the high-dimensional parameter space of the neural networks, we use the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
