A K-fold Method for Baseline Estimation in Policy Gradient Algorithms
Nithyanand Kota, Abhishek Mishra, Sunil Srinivasa, Xi (Peter) Chen,, Pieter Abbeel

TL;DR
This paper introduces a K-fold method for baseline estimation in policy gradient algorithms to better balance bias and variance, improving performance in reinforcement learning tasks.
Contribution
The paper proposes a novel K-fold baseline estimation technique that adjusts bias-variance trade-off in policy gradient methods.
Findings
Improved stability and performance in MuJoCo control tasks
Effective bias-variance trade-off adjustment via K parameter
Demonstrated benefits on state-of-the-art algorithms
Abstract
The high variance issue in unbiased policy-gradient methods such as VPG and REINFORCE is typically mitigated by adding a baseline. However, the baseline fitting itself suffers from the underfitting or the overfitting problem. In this paper, we develop a K-fold method for baseline estimation in policy gradient algorithms. The parameter K is the baseline estimation hyperparameter that can adjust the bias-variance trade-off in the baseline estimates. We demonstrate the usefulness of our approach via two state-of-the-art policy gradient algorithms on three MuJoCo locomotive control tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Fuel Cells and Related Materials
