Bayesian policy gradient and actor-critic algorithms

Mohammad Ghavamzadeh; Yaakov Engel; Michal Valko

arXiv:2604.27563·cs.LG·May 1, 2026·34 cites

Bayesian policy gradient and actor-critic algorithms

Mohammad Ghavamzadeh, Yaakov Engel, Michal Valko

PDF

1 Datasets

TL;DR

This paper introduces a Bayesian framework for policy gradient and actor-critic algorithms in reinforcement learning, reducing sample complexity and providing uncertainty estimates, with extensions to partially observable problems.

Contribution

It proposes a novel Bayesian approach using Gaussian processes for policy gradient estimation and actor-critic models, improving efficiency and uncertainty quantification.

Findings

01

Reduces the number of samples needed for accurate gradient estimates.

02

Provides uncertainty measures like gradient covariance at low additional cost.

03

Outperforms classic Monte-Carlo methods in experimental comparisons.

Abstract

Policy gradient methods are reinforcement learning algorithms that adapt a parameterized policy by following a performance gradient estimate. Conventional policy gradient methods use Monte-Carlo techniques to estimate the gradient, which tend to have high variance, requiring many samples and resulting in slow convergence. We first propose a Bayesian framework for policy gradient, based on modeling the policy gradient as a Gaussian process. This reduces the number of samples needed to obtain accurate gradient estimates. Moreover, estimates of the natural gradient and a measure of the uncertainty in the gradient estimates, namely, the gradient covariance, are provided at little extra cost. Since the proposed framework considers system trajectories as its basic observable unit, it does not require the dynamics within trajectories to be of any particular form, and can be extended to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

misovalko/my-research-papers
dataset· 103 dl
103 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.