Recall Traces: Backtracking Models for Efficient Reinforcement Learning
Anirudh Goyal, Philemon Brakel, William Fedus, Soumye Singhal, Timothy, Lillicrap, Sergey Levine, Hugo Larochelle, Yoshua Bengio

TL;DR
This paper introduces Recall Traces, a backtracking model that predicts trajectories leading to high-reward states, enhancing sample efficiency in reinforcement learning by focusing training on relevant high-value experiences.
Contribution
It proposes a novel backtracking model to generate high-reward trajectories, improving reinforcement learning efficiency beyond existing methods.
Findings
Improves sample efficiency in RL tasks
Effective for both on-policy and off-policy algorithms
Demonstrates benefits across multiple environments
Abstract
In many environments only a tiny subset of all states yield high reward. In these cases, few of the interactions with the environment provide a relevant learning signal. Hence, we may want to preferentially train on those high-reward states and the probable trajectories leading to them. To this end, we advocate for the use of a backtracking model that predicts the preceding states that terminate at a given high-reward state. We can train a model which, starting from a high value state (or one that is estimated to have high value), predicts and sample for which the (state, action)-tuples may have led to that high value state. These traces of (state, action) pairs, which we refer to as Recall Traces, sampled from this backtracking model starting from a high value state, are informative as they terminate in good states, and hence we can use these traces to improve a policy. We provide a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Advanced Bandit Algorithms Research
