A Single Goal is All You Need: Skills and Exploration Emerge from   Contrastive RL without Rewards, Demonstrations, or Subgoals

Grace Liu; Michael Tang; Benjamin Eysenbach

arXiv:2408.05804·cs.LG·August 13, 2024

A Single Goal is All You Need: Skills and Exploration Emerge from Contrastive RL without Rewards, Demonstrations, or Subgoals

Grace Liu, Michael Tang, Benjamin Eysenbach

PDF

Open Access 1 Video

TL;DR

This paper demonstrates that skills and directed exploration can emerge from a simple contrastive reinforcement learning algorithm without rewards, demonstrations, or subgoals, challenging assumptions about exploration requirements.

Contribution

It introduces a straightforward contrastive RL method that naturally develops skills and exploration without additional signals or complex modifications.

Findings

01

Skills emerge before successful task completion

02

Exploration reduces after reaching the goal reliably

03

Method works without density estimates, ensembles, or extra hyperparameters

Abstract

In this paper, we present empirical evidence of skills and directed exploration emerging from a simple RL algorithm long before any successful trials are observed. For example, in a manipulation task, the agent is given a single observation of the goal state and learns skills, first for moving its end-effector, then for pushing the block, and finally for picking up and placing the block. These skills emerge before the agent has ever successfully placed the block at the goal location and without the aid of any reward functions, demonstrations, or manually-specified distance metrics. Once the agent has learned to reach the goal state reliably, exploration is reduced. Implementing our method involves a simple modification of prior work and does not require density estimates, ensembles, or any additional hyperparameters. Intuitively, the proposed method seems like it should be terrible at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A Single Goal is All You Need: Skills and Exploration Emerge from Contrastive RL without Rewards, Demonstrations, or Subgoals· slideslive

Taxonomy

TopicsTeaching and Learning Programming · Intelligent Tutoring Systems and Adaptive Learning · Innovative Teaching and Learning Methods