TIDBD: Adapting Temporal-difference Step-sizes Through Stochastic Meta-descent
Alex Kearney, Vivek Veeriah, Jaden B. Travnik, Richard S. Sutton,, Patrick M. Pilarski

TL;DR
This paper introduces TIDBD, a method for automatically adapting per-feature step-sizes in TD learning, improving performance and enabling representation learning in both stationary and non-stationary tasks.
Contribution
We extend IDBD to TD learning, creating TIDBD, which adapts vector step-sizes for each feature, enhancing learning efficiency and feature relevance differentiation.
Findings
TIDBD outperforms standard TD and scalar step-size methods in prediction tasks.
TIDBD effectively distinguishes relevant from irrelevant features.
TIDBD improves robot prediction performance over existing methods.
Abstract
In this paper, we introduce a method for adapting the step-sizes of temporal difference (TD) learning. The performance of TD methods often depends on well chosen step-sizes, yet few algorithms have been developed for setting the step-size automatically for TD learning. An important limitation of current methods is that they adapt a single step-size shared by all the weights of the learning system. A vector step-size enables greater optimization by specifying parameters on a per-feature basis. Furthermore, adapting parameters at different rates has the added benefit of being a simple form of representation learning. We generalize Incremental Delta Bar Delta (IDBD)---a vectorized adaptive step-size method for supervised learning---to TD learning, which we name TIDBD. We demonstrate that TIDBD is able to find appropriate step-sizes in both stationary and non-stationary prediction tasks,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Reinforcement Learning in Robotics · Machine Learning and Data Classification
