Recall Traces: Backtracking Models for Efficient Reinforcement Learning

Anirudh Goyal; Philemon Brakel; William Fedus; Soumye Singhal; Timothy; Lillicrap; Sergey Levine; Hugo Larochelle; Yoshua Bengio

arXiv:1804.00379·cs.LG·January 30, 2019·25 cites

Recall Traces: Backtracking Models for Efficient Reinforcement Learning

Anirudh Goyal, Philemon Brakel, William Fedus, Soumye Singhal, Timothy, Lillicrap, Sergey Levine, Hugo Larochelle, Yoshua Bengio

PDF

Open Access

TL;DR

This paper introduces Recall Traces, a backtracking model that predicts trajectories leading to high-reward states, enhancing sample efficiency in reinforcement learning by focusing training on relevant high-value experiences.

Contribution

It proposes a novel backtracking model to generate high-reward trajectories, improving reinforcement learning efficiency beyond existing methods.

Findings

01

Improves sample efficiency in RL tasks

02

Effective for both on-policy and off-policy algorithms

03

Demonstrates benefits across multiple environments

Abstract

In many environments only a tiny subset of all states yield high reward. In these cases, few of the interactions with the environment provide a relevant learning signal. Hence, we may want to preferentially train on those high-reward states and the probable trajectories leading to them. To this end, we advocate for the use of a backtracking model that predicts the preceding states that terminate at a given high-reward state. We can train a model which, starting from a high value state (or one that is estimated to have high value), predicts and sample for which the (state, action)-tuples may have led to that high value state. These traces of (state, action) pairs, which we refer to as Recall Traces, sampled from this backtracking model starting from a high value state, are informative as they terminate in good states, and hence we can use these traces to improve a policy. We provide a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Advanced Bandit Algorithms Research