Adaptive Tree Backup Algorithms for Temporal-Difference Reinforcement   Learning

Brett Daley; Isaac Chan

arXiv:2206.01896·cs.LG·June 7, 2022

Adaptive Tree Backup Algorithms for Temporal-Difference Reinforcement Learning

Brett Daley, Isaac Chan

PDF

Open Access

TL;DR

This paper challenges the common belief that the interpolation parameter in Q(σ) acts as a bias-variance trade-off, showing instead that σ=0 minimizes variance and proposing adaptive methods to improve learning.

Contribution

It introduces the Adaptive Tree Backup (ATB) algorithms that dynamically adjust backup strategies, providing a new approach to balancing bias and variance in temporal-difference learning.

Findings

01

σ=0 minimizes variance without increasing bias

02

Adaptive strategies outperform fixed or time-annealed σ-values

03

Proposed methods improve learning efficiency

Abstract

Q( $σ$ ) is a recently proposed temporal-difference learning method that interpolates between learning from expected backups and sampled backups. It has been shown that intermediate values for the interpolation parameter $σ \in [0, 1]$ perform better in practice, and therefore it is commonly believed that $σ$ functions as a bias-variance trade-off parameter to achieve these improvements. In our work, we disprove this notion, showing that the choice of $σ = 0$ minimizes variance without increasing bias. This indicates that $σ$ must have some other effect on learning that is not fully understood. As an alternative, we hypothesize the existence of a new trade-off: larger $σ$ -values help overcome poor initializations of the value function, at the expense of higher statistical variance. To automatically balance these considerations, we propose Adaptive Tree Backup…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Reinforcement Learning in Robotics · Neural Networks and Applications