EUCLID: Towards Efficient Unsupervised Reinforcement Learning with   Multi-choice Dynamics Model

Yifu Yuan; Jianye Hao; Fei Ni; Yao Mu; Yan Zheng; Yujing Hu; Jinyi; Liu; Yingfeng Chen; Changjie Fan

arXiv:2210.00498·cs.LG·February 23, 2023·1 cites

EUCLID: Towards Efficient Unsupervised Reinforcement Learning with Multi-choice Dynamics Model

Yifu Yuan, Jianye Hao, Fei Ni, Yao Mu, Yan Zheng, Yujing Hu, Jinyi, Liu, Yingfeng Chen, Changjie Fan

PDF

Open Access 1 Video

TL;DR

EUCLID introduces a multi-choice dynamics model for unsupervised reinforcement learning, jointly pre-training dynamics and exploration policies to enhance sample efficiency and performance in downstream tasks.

Contribution

The paper proposes a novel multi-choice dynamics model and a model-fused pre-training framework for unsupervised RL, improving sample efficiency and generalization across behaviors.

Findings

01

Achieves state-of-the-art performance on URLB benchmark.

02

Reaches 104.0% normalized score with 100k fine-tuning steps.

03

Outperforms traditional methods with 20x less data.

Abstract

Unsupervised reinforcement learning (URL) poses a promising paradigm to learn useful behaviors in a task-agnostic environment without the guidance of extrinsic rewards to facilitate the fast adaptation of various downstream tasks. Previous works focused on the pre-training in a model-free manner while lacking the study of transition dynamics modeling that leaves a large space for the improvement of sample efficiency in downstream tasks. To this end, we propose an Efficient Unsupervised Reinforcement Learning Framework with Multi-choice Dynamics model (EUCLID), which introduces a novel model-fused paradigm to jointly pre-train the dynamics model and unsupervised exploration policy in the pre-training phase, thus better leveraging the environmental samples and improving the downstream task sampling efficiency. However, constructing a generalizable model which captures the local dynamics…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

EUCLID: Towards Efficient Unsupervised Reinforcement Learning with Multi-choice Dynamics Model· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Human Pose and Action Recognition