Parameter-free Gradient Temporal Difference Learning

Andrew Jacobsen; Alan Chan

arXiv:2105.04129·cs.LG·May 11, 2021

Parameter-free Gradient Temporal Difference Learning

Andrew Jacobsen, Alan Chan

PDF

Open Access

TL;DR

This paper introduces parameter-free, gradient-based temporal difference algorithms for reinforcement learning that are stable, efficient, and require no hyperparameter tuning, demonstrating competitive performance in large-scale, off-policy settings.

Contribution

The authors develop the first parameter-free, gradient-based TD algorithms with convergence guarantees, combining online learning techniques with reinforcement learning stability.

Findings

01

Algorithms run in linear time.

02

Achieve high-probability convergence matching GTD2 up to log factors.

03

Maintain high prediction performance without hyperparameter tuning.

Abstract

Reinforcement learning lies at the intersection of several challenges. Many applications of interest involve extremely large state spaces, requiring function approximation to enable tractable computation. In addition, the learner has only a single stream of experience with which to evaluate a large number of possible courses of action, necessitating algorithms which can learn off-policy. However, the combination of off-policy learning with function approximation leads to divergence of temporal difference methods. Recent work into gradient-based temporal difference methods has promised a path to stability, but at the cost of expensive hyperparameter tuning. In parallel, progress in online learning has provided parameter-free methods that achieve minimax optimal guarantees up to logarithmic terms, but their application in reinforcement learning has yet to be explored. In this work, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Model Reduction and Neural Networks