Coordinate-wise Control Variates for Deep Policy Gradients

Yuanyi Zhong; Yuan Zhou; Jian Peng

arXiv:2107.04987·cs.LG·August 12, 2021

Coordinate-wise Control Variates for Deep Policy Gradients

Yuanyi Zhong, Yuan Zhou, Jian Peng

PDF

Open Access

TL;DR

This paper introduces coordinate-wise and layer-wise control variates using vector-valued baselines to reduce variance in deep policy gradient methods, improving sample efficiency in continuous control tasks.

Contribution

It explores the under-studied use of vector-valued baselines for variance reduction in policy gradients and integrates these into PPO for enhanced performance.

Findings

01

Lower variance achieved with vector-valued baselines.

02

Enhanced sample efficiency in continuous control benchmarks.

03

Effective integration of coordinate-wise control variates into PPO.

Abstract

The control variates (CV) method is widely used in policy gradient estimation to reduce the variance of the gradient estimators in practice. A control variate is applied by subtracting a baseline function from the state-action value estimates. Then the variance-reduced policy gradient presumably leads to higher learning efficiency. Recent research on control variates with deep neural net policies mainly focuses on scalar-valued baseline functions. The effect of vector-valued baselines is under-explored. This paper investigates variance reduction with coordinate-wise and layer-wise control variates constructed from vector-valued baselines for neural net policies. We present experimental evidence suggesting that lower variance can be obtained with such baselines than with the conventional scalar-valued baseline. We demonstrate how to equip the popular Proximal Policy Optimization (PPO)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Reinforcement Learning in Robotics · Optimization and Search Problems