Bayesian policy gradient and actor-critic algorithms
Mohammad Ghavamzadeh, Yaakov Engel, Michal Valko

TL;DR
This paper introduces a Bayesian framework for policy gradient and actor-critic algorithms in reinforcement learning, reducing sample complexity and providing uncertainty estimates, with extensions to partially observable problems.
Contribution
It proposes a novel Bayesian approach using Gaussian processes for policy gradient estimation and actor-critic models, improving efficiency and uncertainty quantification.
Findings
Reduces the number of samples needed for accurate gradient estimates.
Provides uncertainty measures like gradient covariance at low additional cost.
Outperforms classic Monte-Carlo methods in experimental comparisons.
Abstract
Policy gradient methods are reinforcement learning algorithms that adapt a parameterized policy by following a performance gradient estimate. Conventional policy gradient methods use Monte-Carlo techniques to estimate the gradient, which tend to have high variance, requiring many samples and resulting in slow convergence. We first propose a Bayesian framework for policy gradient, based on modeling the policy gradient as a Gaussian process. This reduces the number of samples needed to obtain accurate gradient estimates. Moreover, estimates of the natural gradient and a measure of the uncertainty in the gradient estimates, namely, the gradient covariance, are provided at little extra cost. Since the proposed framework considers system trajectories as its basic observable unit, it does not require the dynamics within trajectories to be of any particular form, and can be extended to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
