Exploring TD error as a heuristic for $\sigma$ selection in Q($\sigma$, $\lambda$)
Abhishek Nan

TL;DR
This paper investigates using TD error as a heuristic for adaptively selecting the parameter sigma in the Q(sigma, lambda) algorithm, aiming to improve multistep online learning by dynamically balancing sampling and expectation.
Contribution
It introduces a novel TD-error based scheme for adaptively choosing sigma in Q(sigma, lambda), enhancing the algorithm's flexibility and potential performance.
Findings
TD-error can effectively guide sigma selection
Adaptive sigma improves learning stability
Method shows promise in online multistep TD algorithms
Abstract
In the landscape of TD algorithms, the Q(, ) algorithm is an algorithm with the ability to perform a multistep backup in an online manner while also successfully unifying the concepts of sampling with using the expectation across all actions for a state. indicates the extent to which sampling is used. Selecting the value of {\sigma} can be based on characteristics of the current state rather than having a constant value or being time based. This report explores the viability of such a TD-error based scheme.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Algorithms and Data Compression · Distributed systems and fault tolerance
