TL;DR
This paper presents a novel exploration method in reinforcement learning using the successor representation, which can be extended to deep RL and achieves state-of-the-art results in Atari games.
Contribution
It introduces a successor representation-based exploration bonus, the substochastic SR, and extends the approach to deep RL with strong empirical performance.
Findings
The SR norm can serve as an effective exploration bonus.
The SSR implicitly counts state visits, aiding exploration.
The deep RL extension achieves state-of-the-art Atari performance.
Abstract
In this paper we introduce a simple approach for exploration in reinforcement learning (RL) that allows us to develop theoretically justified algorithms in the tabular case but that is also extendable to settings where function approximation is required. Our approach is based on the successor representation (SR), which was originally introduced as a representation defining state generalization by the similarity of successor states. Here we show that the norm of the SR, while it is being learned, can be used as a reward bonus to incentivize exploration. In order to better understand this transient behavior of the norm of the SR we introduce the substochastic successor representation (SSR) and we show that it implicitly counts the number of times each state (or feature) has been observed. We use this result to introduce an algorithm that performs as well as some theoretically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
