Schedule Based Temporal Difference Algorithms
Rohan Deb, Meet Gandhi, Shalabh Bhatnagar

TL;DR
This paper introduces a flexible schedule-based extension to TD($ lambda$) algorithms, allowing the user to specify how weights are assigned to different n-step returns over time, with proven convergence.
Contribution
It proposes a novel lambda-schedule procedure for TD algorithms, enabling dynamic weight assignment and providing convergence guarantees for on-policy and off-policy variants.
Findings
The lambda-schedule generalizes TD($ lambda$) with time-varying parameters.
Proposed algorithms converge almost surely under general conditions.
Flexible weight assignment improves adaptability of TD methods.
Abstract
Learning the value function of a given policy from data samples is an important problem in Reinforcement Learning. TD() is a popular class of algorithms to solve this problem. However, the weights assigned to different -step returns in TD(), controlled by the parameter , decrease exponentially with increasing . In this paper, we present a -schedule procedure that generalizes the TD() algorithm to the case when the parameter could vary with time-step. This allows flexibility in weight assignment, i.e., the user can specify the weights assigned to different -step returns by choosing a sequence . Based on this procedure, we propose an on-policy algorithm - TD()-schedule, and two off-policy algorithms - GTD()-schedule and TDC()-schedule, respectively. We provide proofs of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Formal Methods in Verification
