Adaptive Tree Backup Algorithms for Temporal-Difference Reinforcement Learning
Brett Daley, Isaac Chan

TL;DR
This paper challenges the common belief that the interpolation parameter in Q(σ) acts as a bias-variance trade-off, showing instead that σ=0 minimizes variance and proposing adaptive methods to improve learning.
Contribution
It introduces the Adaptive Tree Backup (ATB) algorithms that dynamically adjust backup strategies, providing a new approach to balancing bias and variance in temporal-difference learning.
Findings
σ=0 minimizes variance without increasing bias
Adaptive strategies outperform fixed or time-annealed σ-values
Proposed methods improve learning efficiency
Abstract
Q() is a recently proposed temporal-difference learning method that interpolates between learning from expected backups and sampled backups. It has been shown that intermediate values for the interpolation parameter perform better in practice, and therefore it is commonly believed that functions as a bias-variance trade-off parameter to achieve these improvements. In our work, we disprove this notion, showing that the choice of minimizes variance without increasing bias. This indicates that must have some other effect on learning that is not fully understood. As an alternative, we hypothesize the existence of a new trade-off: larger -values help overcome poor initializations of the value function, at the expense of higher statistical variance. To automatically balance these considerations, we propose Adaptive Tree Backup…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Reinforcement Learning in Robotics · Neural Networks and Applications
