Demystifying the Recency Heuristic in Temporal-Difference Learning
Brett Daley, Marlos C. Machado, Martha White

TL;DR
This paper provides a theoretical analysis of the recency heuristic in TD learning, showing it guarantees convergence, fast contraction, and effective credit assignment, while violating it can lead to divergence.
Contribution
It offers the first theoretical evidence that the recency heuristic in TD learning facilitates convergence and effective credit assignment.
Findings
Recency heuristic guarantees convergence to the correct value function.
It has a relatively fast contraction rate.
Violating the heuristic can cause divergence in TD methods.
Abstract
The recency heuristic in reinforcement learning is the assumption that stimuli that occurred closer in time to an acquired reward should be more heavily reinforced. The recency heuristic is one of the key assumptions made by TD(), which reinforces recent experiences according to an exponentially decaying weighting. In fact, all other widely used return estimators for TD learning, such as -step returns, satisfy a weaker (i.e., non-monotonic) recency heuristic. Why is the recency heuristic effective for temporal credit assignment? What happens when credit is assigned in a way that violates this heuristic? In this paper, we analyze the specific mathematical implications of adopting the recency heuristic in TD learning. We prove that any return estimator satisfying this heuristic: 1) is guaranteed to converge to the correct value function, 2) has a relatively fast contraction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Language, Discourse, Communication Strategies
