Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration

Max Wilcoxson; Qiyang Li; Kevin Frans; Sergey Levine

arXiv:2410.18076·cs.LG·July 15, 2025

Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration

Max Wilcoxson, Qiyang Li, Kevin Frans, Sergey Levine

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces SUPE, a method that leverages unlabeled offline data to learn exploration strategies in reinforcement learning by extracting skills, pseudo-labeling data, and combining these for efficient online exploration, outperforming prior methods.

Contribution

The paper proposes a novel approach that combines skill extraction and pseudo-labeling of unlabeled data to improve exploration in RL, demonstrating significant performance gains.

Findings

01

SUPE outperforms prior strategies on 42 long-horizon tasks.

02

Combining skill learning with pseudo-labeled data enhances exploration.

03

Transforming prior data into high-level, task-relevant examples is effective.

Abstract

Unsupervised pretraining has been transformative in many supervised domains. However, applying such ideas to reinforcement learning (RL) presents a unique challenge in that fine-tuning does not involve mimicking task-specific data, but rather exploring and locating the solution through iterative self-improvement. In this work, we study how unlabeled offline trajectory data can be leveraged to learn efficient exploration strategies. While prior data can be used to pretrain a set of low-level skills, or as additional off-policy data for online RL, it has been unclear how to combine these ideas effectively for online exploration. Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits. Our method first extracts low-level skills using a variational autoencoder (VAE), and then pseudo-labels unlabeled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rail-berkeley/supe
jaxOfficial

Videos

Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration· slideslive

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Distributed and Parallel Computing Systems · Open Education and E-Learning

MethodsSparse Evolutionary Training