Directly Estimating the Variance of the {\lambda}-Return Using   Temporal-Difference Methods

Craig Sherstan; Brendan Bennett; Kenny Young; Dylan R. Ashley; Adam; White; Martha White; Richard S. Sutton

arXiv:1801.08287·cs.AI·February 15, 2018

Directly Estimating the Variance of the {\lambda}-Return Using Temporal-Difference Methods

Craig Sherstan, Brendan Bennett, Kenny Young, Dylan R. Ashley, Adam, White, Martha White, Richard S. Sutton

PDF

Open Access

TL;DR

This paper introduces a simple and robust method for directly estimating the variance of the {\lambda}-return in reinforcement learning, improving risk assessment and parameter adaptation during online learning.

Contribution

We propose a novel, simpler approach to estimate the variance of the {\lambda}-return directly, outperforming complex existing methods in robustness and empirical performance.

Findings

01

The new method is simpler than prior approaches.

02

It performs at least as well as existing methods.

03

It demonstrates increased robustness in empirical tests.

Abstract

This paper investigates estimating the variance of a temporal-difference learning agent's update target. Most reinforcement learning methods use an estimate of the value function, which captures how good it is for the agent to be in a particular state and is mathematically expressed as the expected sum of discounted future rewards (called the return). These values can be straightforwardly estimated by averaging batches of returns using Monte Carlo methods. However, if we wish to update the agent's value estimates during learning--before terminal outcomes are observed--we must use a different estimation target called the {\lambda}-return, which truncates the return with the agent's own estimate of the value function. Temporal difference learning methods estimate the expected {\lambda}-return for each state, allowing these methods to update online and incrementally, and in most cases…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems