Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning

Brett Daley; Martha White; Christopher Amato; Marlos C. Machado

arXiv:2301.11321·cs.LG·December 23, 2025

Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning

Brett Daley, Martha White, Christopher Amato, Marlos C. Machado

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new multistep operator for off-policy reinforcement learning that unifies and analyzes per-decision and trajectory-aware methods, providing convergence guarantees and a robust sampling technique.

Contribution

It proposes a novel multistep operator that encompasses existing methods, proves their convergence in tabular settings, and introduces RBIS for improved off-policy control.

Findings

01

Proves convergence conditions for the new operator and existing methods.

02

Introduces RBIS, a trajectory-aware sampling method with robust performance.

03

Unifies per-decision and trajectory-aware approaches under a common framework.

Abstract

Off-policy learning from multistep returns is crucial for sample-efficient reinforcement learning, but counteracting off-policy bias without exacerbating variance is challenging. Classically, off-policy bias is corrected in a per-decision manner: past temporal-difference errors are re-weighted by the instantaneous Importance Sampling (IS) ratio after each action via eligibility traces. Many off-policy algorithms rely on this mechanism, along with differing protocols for cutting the IS ratios to combat the variance of the IS estimator. Unfortunately, once a trace has been fully cut, the effect cannot be reversed. This has led to the development of credit-assignment strategies that account for multiple past experiences at a time. These trajectory-aware methods have not been extensively analyzed, and their theoretical justification remains uncertain. In this paper, we propose a multistep…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

brett-daley/trajectory-aware-etraces
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Smart Grid Energy Management · Advanced Bandit Algorithms Research