Sequence Compression Speeds Up Credit Assignment in Reinforcement Learning
Aditya A. Ramesh, Kenny Young, Louis Kirsch, J\"urgen Schmidhuber

TL;DR
This paper introduces Chunked-TD, a novel method that uses trajectory chunking and learned world models to accelerate credit assignment in reinforcement learning, reducing reliance on model accuracy and improving learning speed.
Contribution
The paper presents Chunked-TD, a new approach that leverages trajectory chunking with learned models to improve credit assignment speed in reinforcement learning.
Findings
Chunked-TD outperforms traditional TD(λ) in speed on certain problems.
The method effectively compresses environment trajectories to speed up learning.
Chunked-TD maintains robustness despite model inaccuracies.
Abstract
Temporal credit assignment in reinforcement learning is challenging due to delayed and stochastic outcomes. Monte Carlo targets can bridge long delays between action and consequence but lead to high-variance targets due to stochasticity. Temporal difference (TD) learning uses bootstrapping to overcome variance but introduces a bias that can only be corrected through many iterations. TD() provides a mechanism to navigate this bias-variance tradeoff smoothly. Appropriately selecting can significantly improve performance. Here, we propose Chunked-TD, which uses predicted probabilities of transitions from a model for computing -return targets. Unlike other model-based solutions to credit assignment, Chunked-TD is less vulnerable to model inaccuracies. Our approach is motivated by the principle of history compression and 'chunks' trajectories for conventional TD…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · Reinforcement Learning in Robotics
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
