Deep Reinforcement Learning with Gradient Eligibility Traces

Esraa Elelimy; Brett Daley; Andrew Patterson; Marlos C. Machado; Adam White; Martha White

arXiv:2507.09087·cs.LG·September 22, 2025

Deep Reinforcement Learning with Gradient Eligibility Traces

Esraa Elelimy, Brett Daley, Andrew Patterson, Marlos C. Machado, Adam White, Martha White

PDF

Open Access 1 Repo

TL;DR

This paper extends Gradient TD methods to support multistep credit assignment in deep reinforcement learning, improving stability and efficiency over existing methods, and demonstrates superior performance in MuJoCo and MinAtar environments.

Contribution

It introduces a multistep extension of the generalized PBE objective for Gradient TD methods, with new algorithms for deep RL that outperform existing approaches.

Findings

01

Proposed algorithms outperform PPO and StreamQ in MuJoCo and MinAtar.

02

Extended GTD methods achieve better stability and sample efficiency.

03

Demonstrated effectiveness of multistep credit assignment in deep RL environments.

Abstract

Achieving fast and stable off-policy learning in deep reinforcement learning (RL) is challenging. Most existing methods rely on semi-gradient temporal-difference (TD) methods for their simplicity and efficiency, but are consequently susceptible to divergence. While more principled approaches like Gradient TD (GTD) methods have strong convergence guarantees, they have rarely been used in deep RL. Recent work introduced the generalized Projected Bellman Error ( $\overline{PBE}$ ), enabling GTD methods to work efficiently with nonlinear function approximation. However, this work is limited to one-step methods, which are slow at credit assignment and require a large number of samples. In this paper, we extend the generalized $\overline{PBE}$ objective to support multistep credit assignment based on the $λ$ -return and derive three gradient-based methods that optimize this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

esraaelelimy/gtd_algos
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsExperience Replay · Proximal Policy Optimization