TL;DR
This paper introduces expected eligibility traces in reinforcement learning, enabling more effective credit assignment by updating not only recent states but also plausible preceding states, potentially improving learning efficiency.
Contribution
The work proposes a novel expected eligibility trace method that generalizes traditional traces, allowing updates to counterfactual states and actions with a smooth interpolation mechanism.
Findings
Expected traces can outperform classic traces in certain scenarios.
The interpolation mechanism generalizes TD(λ) and enhances credit assignment.
Potential connections to successor features are discussed.
Abstract
The question of how to determine which states and actions are responsible for a certain outcome is known as the credit assignment problem and remains a central research question in reinforcement learning and artificial intelligence. Eligibility traces enable efficient credit assignment to the recent sequence of states and actions experienced by the agent, but not to counterfactual sequences that could also have led to the current state. In this work, we introduce expected eligibility traces. Expected traces allow, with a single update, to update states and actions that could have preceded the current state, even if they did not do so on this occasion. We discuss when expected traces provide benefits over classic (instantaneous) traces in temporal-difference learning, and show that sometimes substantial improvements can be attained. We provide a way to smoothly interpolate between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
