Deep Bayesian Quadrature Policy Optimization
Akella Ravi Tej, Kamyar Azizzadenesheli, Mohammad Ghavamzadeh, Anima, Anandkumar, Yisong Yue

TL;DR
This paper introduces deep Bayesian quadrature policy gradient (DBQPG), a novel method that improves the accuracy and efficiency of policy gradient estimates in reinforcement learning by reducing variance and incorporating uncertainty.
Contribution
It presents a computationally efficient high-dimensional Bayesian quadrature approach for policy gradient estimation, outperforming Monte-Carlo methods in several benchmarks.
Findings
DBQPG yields more accurate gradient estimates with lower variance.
It improves sample efficiency and average returns in deep policy gradient algorithms.
Uncertainty in gradient estimates can be leveraged for further performance gains.
Abstract
We study the problem of obtaining accurate policy gradient estimates using a finite number of samples. Monte-Carlo methods have been the default choice for policy gradient estimation, despite suffering from high variance in the gradient estimates. On the other hand, more sample efficient alternatives like Bayesian quadrature methods have received little attention due to their high computational complexity. In this work, we propose deep Bayesian quadrature policy gradient (DBQPG), a computationally efficient high-dimensional generalization of Bayesian quadrature, for policy gradient estimation. We show that DBQPG can substitute Monte-Carlo estimation in policy gradient methods, and demonstrate its effectiveness on a set of continuous control benchmarks. In comparison to Monte-Carlo estimation, DBQPG provides (i) more accurate gradient estimates with a significantly lower variance, (ii) a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
