Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration
Max Wilcoxson, Qiyang Li, Kevin Frans, Sergey Levine

TL;DR
This paper introduces SUPE, a method that leverages unlabeled offline data to learn exploration strategies in reinforcement learning by extracting skills, pseudo-labeling data, and combining these for efficient online exploration, outperforming prior methods.
Contribution
The paper proposes a novel approach that combines skill extraction and pseudo-labeling of unlabeled data to improve exploration in RL, demonstrating significant performance gains.
Findings
SUPE outperforms prior strategies on 42 long-horizon tasks.
Combining skill learning with pseudo-labeled data enhances exploration.
Transforming prior data into high-level, task-relevant examples is effective.
Abstract
Unsupervised pretraining has been transformative in many supervised domains. However, applying such ideas to reinforcement learning (RL) presents a unique challenge in that fine-tuning does not involve mimicking task-specific data, but rather exploring and locating the solution through iterative self-improvement. In this work, we study how unlabeled offline trajectory data can be leveraged to learn efficient exploration strategies. While prior data can be used to pretrain a set of low-level skills, or as additional off-policy data for online RL, it has been unclear how to combine these ideas effectively for online exploration. Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits. Our method first extracts low-level skills using a variational autoencoder (VAE), and then pseudo-labels unlabeled…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Distributed and Parallel Computing Systems · Open Education and E-Learning
MethodsSparse Evolutionary Training
