Parameter-free Gradient Temporal Difference Learning
Andrew Jacobsen, Alan Chan

TL;DR
This paper introduces parameter-free, gradient-based temporal difference algorithms for reinforcement learning that are stable, efficient, and require no hyperparameter tuning, demonstrating competitive performance in large-scale, off-policy settings.
Contribution
The authors develop the first parameter-free, gradient-based TD algorithms with convergence guarantees, combining online learning techniques with reinforcement learning stability.
Findings
Algorithms run in linear time.
Achieve high-probability convergence matching GTD2 up to log factors.
Maintain high prediction performance without hyperparameter tuning.
Abstract
Reinforcement learning lies at the intersection of several challenges. Many applications of interest involve extremely large state spaces, requiring function approximation to enable tractable computation. In addition, the learner has only a single stream of experience with which to evaluate a large number of possible courses of action, necessitating algorithms which can learn off-policy. However, the combination of off-policy learning with function approximation leads to divergence of temporal difference methods. Recent work into gradient-based temporal difference methods has promised a path to stability, but at the cost of expensive hyperparameter tuning. In parallel, progress in online learning has provided parameter-free methods that achieve minimax optimal guarantees up to logarithmic terms, but their application in reinforcement learning has yet to be explored. In this work, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Model Reduction and Neural Networks
