Sequence Compression Speeds Up Credit Assignment in Reinforcement   Learning

Aditya A. Ramesh; Kenny Young; Louis Kirsch; J\"urgen Schmidhuber

arXiv:2405.03878·cs.LG·June 5, 2024

Sequence Compression Speeds Up Credit Assignment in Reinforcement Learning

Aditya A. Ramesh, Kenny Young, Louis Kirsch, J\"urgen Schmidhuber

PDF

Open Access 1 Repo

TL;DR

This paper introduces Chunked-TD, a novel method that uses trajectory chunking and learned world models to accelerate credit assignment in reinforcement learning, reducing reliance on model accuracy and improving learning speed.

Contribution

The paper presents Chunked-TD, a new approach that leverages trajectory chunking with learned models to improve credit assignment speed in reinforcement learning.

Findings

01

Chunked-TD outperforms traditional TD(λ) in speed on certain problems.

02

The method effectively compresses environment trajectories to speed up learning.

03

Chunked-TD maintains robustness despite model inaccuracies.

Abstract

Temporal credit assignment in reinforcement learning is challenging due to delayed and stochastic outcomes. Monte Carlo targets can bridge long delays between action and consequence but lead to high-variance targets due to stochasticity. Temporal difference (TD) learning uses bootstrapping to overcome variance but introduces a bias that can only be corrected through many iterations. TD( $λ$ ) provides a mechanism to navigate this bias-variance tradeoff smoothly. Appropriately selecting $λ$ can significantly improve performance. Here, we propose Chunked-TD, which uses predicted probabilities of transitions from a model for computing $λ$ -return targets. Unlike other model-based solutions to credit assignment, Chunked-TD is less vulnerable to model inaccuracies. Our approach is motivated by the principle of history compression and 'chunks' trajectories for conventional TD…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aditya-ramesh-10/chunktd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEvolutionary Algorithms and Applications · Reinforcement Learning in Robotics

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings