Time-Myopic Go-Explore: Learning A State Representation for the   Go-Explore Paradigm

Marc H\"oftmann; Jan Robine; Stefan Harmeling

arXiv:2301.05635·cs.LG·January 16, 2023

Time-Myopic Go-Explore: Learning A State Representation for the Go-Explore Paradigm

Marc H\"oftmann, Jan Robine, Stefan Harmeling

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel time-myopic state representation for the Go-Explore reinforcement learning paradigm, improving exploration efficiency in large, sparse reward environments by estimating novelty without handcrafted heuristics.

Contribution

The work presents the first learned state representation for Go-Explore that estimates novelty and addresses the detachment problem, enhancing exploration in complex environments.

Findings

01

Improved exploration in Atari games like MontezumaRevenge, Gravitar, and Frostbite.

02

Reliable estimation of state novelty without handcrafted heuristics.

03

Effective coverage of the entire state space with respect to time trajectories.

Abstract

Very large state spaces with a sparse reward signal are difficult to explore. The lack of a sophisticated guidance results in a poor performance for numerous reinforcement learning algorithms. In these cases, the commonly used random exploration is often not helpful. The literature shows that this kind of environments require enormous efforts to systematically explore large chunks of the state space. Learned state representations can help here to improve the search by providing semantic context and build a structure on top of the raw observations. In this work we introduce a novel time-myopic state representation that clusters temporal close states together while providing a time prediction capability between them. By adapting this model to the Go-Explore paradigm (Ecoffet et al., 2021b), we demonstrate the first learned state representation that reliably estimates novelty instead of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hauf3n/time-myopic-go-explore
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Scientific Computing and Data Management · Advanced Bandit Algorithms Research

MethodsGo-Explore