Improving Gradient Estimation by Incorporating Sensor Data

Gregory Lawrence; Stuart Russell

arXiv:1206.3272·cs.AI·June 18, 2012

Improving Gradient Estimation by Incorporating Sensor Data

Gregory Lawrence, Stuart Russell

PDF

Open Access

TL;DR

This paper proposes a novel policy gradient estimation method that incorporates sensor data to reduce variance and improve learning efficiency in policy search algorithms.

Contribution

It introduces a new gradient estimator that leverages sensor data alongside rewards, providing a theoretical and empirical demonstration of faster, more accurate policy learning.

Findings

01

Sensor data improves gradient estimation accuracy.

02

Incorporating sensor data reduces variance in policy gradient estimates.

03

The method accelerates policy learning in noisy environments.

Abstract

An efficient policy search algorithm should estimate the local gradient of the objective function, with respect to the policy parameters, from as few trials as possible. Whereas most policy search methods estimate this gradient by observing the rewards obtained during policy trials, we show, both theoretically and empirically, that taking into account the sensor data as well gives better gradient estimates and hence faster learning. The reason is that rewards obtained during policy execution vary from trial to trial due to noise in the environment; sensor data, which correlates with the noise, can be used to partially correct for this variation, resulting in an estimatorwith lower variance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Motor Control and Adaptation · Robot Manipulation and Learning