Fixed-Horizon Temporal Difference Methods for Stable Reinforcement   Learning

Kristopher De Asis; Alan Chan; Silviu Pitis; Richard S. Sutton; Daniel; Graves

arXiv:1909.03906·cs.LG·February 12, 2020

Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning

Kristopher De Asis, Alan Chan, Silviu Pitis, Richard S. Sutton, Daniel, Graves

PDF

TL;DR

This paper introduces fixed-horizon temporal difference methods for reinforcement learning, which predict rewards over a set number of steps and avoid stability issues of traditional off-policy TD methods, with proven convergence.

Contribution

The paper presents a novel fixed-horizon TD approach that is immune to the deadly triad stability problems and demonstrates its effectiveness and convergence.

Findings

01

Fixed-horizon methods are stable and avoid the deadly triad.

02

They can be used competitively with Q-learning.

03

Convergence is proven for linear and general function approximation.

Abstract

We explore fixed-horizon temporal difference (TD) methods, reinforcement learning algorithms for a new kind of value function that predicts the sum of rewards over a $fixed$ number of future time steps. To learn the value function for horizon $h$ , these algorithms bootstrap from the value function for horizon $h - 1$ , or some shorter horizon. Because no value function bootstraps from itself, fixed-horizon methods are immune to the stability problems that plague other off-policy TD methods using function approximation (also known as "the deadly triad"). Although fixed-horizon methods require the storage of additional value functions, this gives the agent additional predictive power, while the added complexity can be substantially reduced via parallel updates, shared weights, and $n$ -step bootstrapping. We show how to use fixed-horizon value functions to solve reinforcement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsQ-Learning