Improving Deep Policy Gradients with Value Function Search

Enrico Marchesini; Christopher Amato

arXiv:2302.10145·cs.LG·February 21, 2023

Improving Deep Policy Gradients with Value Function Search

Enrico Marchesini, Christopher Amato

PDF

Open Access 1 Video

TL;DR

This paper introduces a Value Function Search method that enhances value network approximation in deep policy gradient algorithms, leading to better sample efficiency and higher-performing policies without extra environment interactions.

Contribution

It proposes a novel, computationally inexpensive Value Function Search technique that improves value approximation and enhances deep policy gradient performance.

Findings

01

Improved value approximation enhances policy performance.

02

Enhanced primitives lead to higher returns in benchmarks.

03

Method increases sample efficiency without extra environment interactions.

Abstract

Deep Policy Gradient (PG) algorithms employ value networks to drive the learning of parameterized policies and reduce the variance of the gradient estimates. However, value function approximation gets stuck in local optima and struggles to fit the actual return, limiting the variance reduction efficacy and leading policies to sub-optimal performance. This paper focuses on improving value approximation and analyzing the effects on Deep PG primitives such as value prediction, variance reduction, and correlation of gradient estimates with the true gradient. To this end, we introduce a Value Function Search that employs a population of perturbed value networks to search for a better approximation. Our framework does not require additional environment interactions, gradient computations, or ensembles, providing a computationally inexpensive approach to enhance the supervised learning task on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Improving Deep Policy Gradients with Value Function Search· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques