Exploring TD error as a heuristic for $\sigma$ selection in Q($\sigma$,   $\lambda$)

Abhishek Nan

arXiv:1912.10316·cs.LG·December 24, 2019

Exploring TD error as a heuristic for $\sigma$ selection in Q($\sigma$, $\lambda$)

Abhishek Nan

PDF

Open Access 1 Repo

TL;DR

This paper investigates using TD error as a heuristic for adaptively selecting the parameter sigma in the Q(sigma, lambda) algorithm, aiming to improve multistep online learning by dynamically balancing sampling and expectation.

Contribution

It introduces a novel TD-error based scheme for adaptively choosing sigma in Q(sigma, lambda), enhancing the algorithm's flexibility and potential performance.

Findings

01

TD-error can effectively guide sigma selection

02

Adaptive sigma improves learning stability

03

Method shows promise in online multistep TD algorithms

Abstract

In the landscape of TD algorithms, the Q( $σ$ , $λ$ ) algorithm is an algorithm with the ability to perform a multistep backup in an online manner while also successfully unifying the concepts of sampling with using the expectation across all actions for a state. $σ \in [0, 1]$ indicates the extent to which sampling is used. Selecting the value of {\sigma} can be based on characteristics of the current state rather than having a constant value or being time based. This report explores the viability of such a TD-error based scheme.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

abnan/CMPUT_609_Project
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Algorithms and Data Compression · Distributed systems and fault tolerance