Deep Bayesian Quadrature Policy Optimization

Akella Ravi Tej; Kamyar Azizzadenesheli; Mohammad Ghavamzadeh; Anima; Anandkumar; Yisong Yue

arXiv:2006.15637·cs.LG·December 17, 2020

Deep Bayesian Quadrature Policy Optimization

Akella Ravi Tej, Kamyar Azizzadenesheli, Mohammad Ghavamzadeh, Anima, Anandkumar, Yisong Yue

PDF

Open Access 1 Repo

TL;DR

This paper introduces deep Bayesian quadrature policy gradient (DBQPG), a novel method that improves the accuracy and efficiency of policy gradient estimates in reinforcement learning by reducing variance and incorporating uncertainty.

Contribution

It presents a computationally efficient high-dimensional Bayesian quadrature approach for policy gradient estimation, outperforming Monte-Carlo methods in several benchmarks.

Findings

01

DBQPG yields more accurate gradient estimates with lower variance.

02

It improves sample efficiency and average returns in deep policy gradient algorithms.

03

Uncertainty in gradient estimates can be leveraged for further performance gains.

Abstract

We study the problem of obtaining accurate policy gradient estimates using a finite number of samples. Monte-Carlo methods have been the default choice for policy gradient estimation, despite suffering from high variance in the gradient estimates. On the other hand, more sample efficient alternatives like Bayesian quadrature methods have received little attention due to their high computational complexity. In this work, we propose deep Bayesian quadrature policy gradient (DBQPG), a computationally efficient high-dimensional generalization of Bayesian quadrature, for policy gradient estimation. We show that DBQPG can substitute Monte-Carlo estimation in policy gradient methods, and demonstrate its effectiveness on a set of continuous control benchmarks. In comparison to Monte-Carlo estimation, DBQPG provides (i) more accurate gradient estimates with a significantly lower variance, (ii) a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Akella17/Deep-Bayesian-Quadrature-Policy-Optimization
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications