Time-Myopic Go-Explore: Learning A State Representation for the Go-Explore Paradigm
Marc H\"oftmann, Jan Robine, Stefan Harmeling

TL;DR
This paper introduces a novel time-myopic state representation for the Go-Explore reinforcement learning paradigm, improving exploration efficiency in large, sparse reward environments by estimating novelty without handcrafted heuristics.
Contribution
The work presents the first learned state representation for Go-Explore that estimates novelty and addresses the detachment problem, enhancing exploration in complex environments.
Findings
Improved exploration in Atari games like MontezumaRevenge, Gravitar, and Frostbite.
Reliable estimation of state novelty without handcrafted heuristics.
Effective coverage of the entire state space with respect to time trajectories.
Abstract
Very large state spaces with a sparse reward signal are difficult to explore. The lack of a sophisticated guidance results in a poor performance for numerous reinforcement learning algorithms. In these cases, the commonly used random exploration is often not helpful. The literature shows that this kind of environments require enormous efforts to systematically explore large chunks of the state space. Learned state representations can help here to improve the search by providing semantic context and build a structure on top of the raw observations. In this work we introduce a novel time-myopic state representation that clusters temporal close states together while providing a time prediction capability between them. By adapting this model to the Go-Explore paradigm (Ecoffet et al., 2021b), we demonstrate the first learned state representation that reliably estimates novelty instead of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Scientific Computing and Data Management · Advanced Bandit Algorithms Research
MethodsGo-Explore
