Improving Deep Policy Gradients with Value Function Search
Enrico Marchesini, Christopher Amato

TL;DR
This paper introduces a Value Function Search method that enhances value network approximation in deep policy gradient algorithms, leading to better sample efficiency and higher-performing policies without extra environment interactions.
Contribution
It proposes a novel, computationally inexpensive Value Function Search technique that improves value approximation and enhances deep policy gradient performance.
Findings
Improved value approximation enhances policy performance.
Enhanced primitives lead to higher returns in benchmarks.
Method increases sample efficiency without extra environment interactions.
Abstract
Deep Policy Gradient (PG) algorithms employ value networks to drive the learning of parameterized policies and reduce the variance of the gradient estimates. However, value function approximation gets stuck in local optima and struggles to fit the actual return, limiting the variance reduction efficacy and leading policies to sub-optimal performance. This paper focuses on improving value approximation and analyzing the effects on Deep PG primitives such as value prediction, variance reduction, and correlation of gradient estimates with the true gradient. To this end, we introduce a Value Function Search that employs a population of perturbed value networks to search for a better approximation. Our framework does not require additional environment interactions, gradient computations, or ensembles, providing a computationally inexpensive approach to enhance the supervised learning task on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques
