A Single Goal is All You Need: Skills and Exploration Emerge from Contrastive RL without Rewards, Demonstrations, or Subgoals
Grace Liu, Michael Tang, Benjamin Eysenbach

TL;DR
This paper demonstrates that skills and directed exploration can emerge from a simple contrastive reinforcement learning algorithm without rewards, demonstrations, or subgoals, challenging assumptions about exploration requirements.
Contribution
It introduces a straightforward contrastive RL method that naturally develops skills and exploration without additional signals or complex modifications.
Findings
Skills emerge before successful task completion
Exploration reduces after reaching the goal reliably
Method works without density estimates, ensembles, or extra hyperparameters
Abstract
In this paper, we present empirical evidence of skills and directed exploration emerging from a simple RL algorithm long before any successful trials are observed. For example, in a manipulation task, the agent is given a single observation of the goal state and learns skills, first for moving its end-effector, then for pushing the block, and finally for picking up and placing the block. These skills emerge before the agent has ever successfully placed the block at the goal location and without the aid of any reward functions, demonstrations, or manually-specified distance metrics. Once the agent has learned to reach the goal state reliably, exploration is reduced. Implementing our method involves a simple modification of prior work and does not require density estimates, ensembles, or any additional hyperparameters. Intuitively, the proposed method seems like it should be terrible at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTeaching and Learning Programming · Intelligent Tutoring Systems and Adaptive Learning · Innovative Teaching and Learning Methods
