Coordinate-wise Control Variates for Deep Policy Gradients
Yuanyi Zhong, Yuan Zhou, Jian Peng

TL;DR
This paper introduces coordinate-wise and layer-wise control variates using vector-valued baselines to reduce variance in deep policy gradient methods, improving sample efficiency in continuous control tasks.
Contribution
It explores the under-studied use of vector-valued baselines for variance reduction in policy gradients and integrates these into PPO for enhanced performance.
Findings
Lower variance achieved with vector-valued baselines.
Enhanced sample efficiency in continuous control benchmarks.
Effective integration of coordinate-wise control variates into PPO.
Abstract
The control variates (CV) method is widely used in policy gradient estimation to reduce the variance of the gradient estimators in practice. A control variate is applied by subtracting a baseline function from the state-action value estimates. Then the variance-reduced policy gradient presumably leads to higher learning efficiency. Recent research on control variates with deep neural net policies mainly focuses on scalar-valued baseline functions. The effect of vector-valued baselines is under-explored. This paper investigates variance reduction with coordinate-wise and layer-wise control variates constructed from vector-valued baselines for neural net policies. We present experimental evidence suggesting that lower variance can be obtained with such baselines than with the conventional scalar-valued baseline. We demonstrate how to equip the popular Proximal Policy Optimization (PPO)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Reinforcement Learning in Robotics · Optimization and Search Problems
