Schedule Based Temporal Difference Algorithms

Rohan Deb; Meet Gandhi; Shalabh Bhatnagar

arXiv:2111.11768·cs.LG·November 24, 2021

Schedule Based Temporal Difference Algorithms

Rohan Deb, Meet Gandhi, Shalabh Bhatnagar

PDF

Open Access

TL;DR

This paper introduces a flexible schedule-based extension to TD($ lambda$) algorithms, allowing the user to specify how weights are assigned to different n-step returns over time, with proven convergence.

Contribution

It proposes a novel lambda-schedule procedure for TD algorithms, enabling dynamic weight assignment and providing convergence guarantees for on-policy and off-policy variants.

Findings

01

The lambda-schedule generalizes TD($ lambda$) with time-varying parameters.

02

Proposed algorithms converge almost surely under general conditions.

03

Flexible weight assignment improves adaptability of TD methods.

Abstract

Learning the value function of a given policy from data samples is an important problem in Reinforcement Learning. TD( $λ$ ) is a popular class of algorithms to solve this problem. However, the weights assigned to different $n$ -step returns in TD( $λ$ ), controlled by the parameter $λ$ , decrease exponentially with increasing $n$ . In this paper, we present a $λ$ -schedule procedure that generalizes the TD( $λ$ ) algorithm to the case when the parameter $λ$ could vary with time-step. This allows flexibility in weight assignment, i.e., the user can specify the weights assigned to different $n$ -step returns by choosing a sequence ${λ_{t}}_{t \geq 1}$ . Based on this procedure, we propose an on-policy algorithm - TD( $λ$ )-schedule, and two off-policy algorithms - GTD( $λ$ )-schedule and TDC( $λ$ )-schedule, respectively. We provide proofs of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Formal Methods in Verification