A K-fold Method for Baseline Estimation in Policy Gradient Algorithms

Nithyanand Kota; Abhishek Mishra; Sunil Srinivasa; Xi (Peter) Chen,; Pieter Abbeel

arXiv:1701.00867·cs.AI·January 5, 2017

A K-fold Method for Baseline Estimation in Policy Gradient Algorithms

Nithyanand Kota, Abhishek Mishra, Sunil Srinivasa, Xi (Peter) Chen,, Pieter Abbeel

PDF

Open Access

TL;DR

This paper introduces a K-fold method for baseline estimation in policy gradient algorithms to better balance bias and variance, improving performance in reinforcement learning tasks.

Contribution

The paper proposes a novel K-fold baseline estimation technique that adjusts bias-variance trade-off in policy gradient methods.

Findings

01

Improved stability and performance in MuJoCo control tasks

02

Effective bias-variance trade-off adjustment via K parameter

03

Demonstrated benefits on state-of-the-art algorithms

Abstract

The high variance issue in unbiased policy-gradient methods such as VPG and REINFORCE is typically mitigated by adding a baseline. However, the baseline fitting itself suffers from the underfitting or the overfitting problem. In this paper, we develop a K-fold method for baseline estimation in policy gradient algorithms. The parameter K is the baseline estimation hyperparameter that can adjust the bias-variance trade-off in the baseline estimates. We demonstrate the usefulness of our approach via two state-of-the-art policy gradient algorithms on three MuJoCo locomotive control tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Fuel Cells and Related Materials