METRA: Scalable Unsupervised RL with Metric-Aware Abstraction

Seohong Park; Oleh Rybkin; Sergey Levine

arXiv:2310.08887·cs.LG·March 12, 2024·2 cites

METRA: Scalable Unsupervised RL with Metric-Aware Abstraction

Seohong Park, Oleh Rybkin, Sergey Levine

PDF

Open Access 1 Repo 3 Reviews

TL;DR

METRA introduces a scalable unsupervised reinforcement learning method that learns to explore high-dimensional environments by covering a compact latent space connected to the environment's state space, enabling discovery of diverse behaviors.

Contribution

The paper proposes Metric-Aware Abstraction (METRA), a novel unsupervised RL objective that effectively scales to complex environments by focusing on a latent space linked to the state space through temporal distances.

Findings

01

METRA successfully discovers diverse locomotion behaviors in pixel-based environments.

02

It is the first unsupervised RL method to find such behaviors in Quadruped and Humanoid environments.

03

METRA demonstrates scalability and effectiveness in high-dimensional, complex tasks.

Abstract

Unsupervised pre-training strategies have proven to be highly effective in natural language processing and computer vision. Likewise, unsupervised reinforcement learning (RL) holds the promise of discovering a variety of potentially useful behaviors that can accelerate the learning of a wide array of downstream tasks. Previous unsupervised RL approaches have mainly focused on pure exploration and mutual information skill learning. However, despite the previous attempts, making unsupervised RL truly scalable still remains a major open challenge: pure exploration approaches might struggle in complex environments with large state spaces, where covering every possible transition is infeasible, and mutual information skill learning approaches might completely fail to explore the environment due to the lack of incentives. To make unsupervised RL scalable to complex, high-dimensional…

Peer Reviews

Decision·ICLR 2024 oral

Reviewer 01Rating 8· accept, good paperConfidence 4

Strengths

- The empirical study of this paper is very sound and solid. The paper evaluates the method on various control tasks, including locomotion and manipulation tasks. Besides, the paper aims to address the unsupervised RL problem on visual-based tasks, which are much more challenging in the area. The paper also compare the results to multiple previous works, showing the significant improvement on skill discovery. - The methodology part is very organized. The authors aim to maximize state converage

Weaknesses

- The paper can be more impactful and solid if the method is deployed on the real world tasks, like locomotion control on a real robot. Besides, as the authors have already listed in Appendix A, the method can be combined to more recent RL works.

Reviewer 02Rating 8· accept, good paperConfidence 5

Strengths

1. The paper introduces a novel unsupervised RL objective, METRA, which is a significant contribution to the field. The idea of using temporal distances as a metric for the latent space is innovative and provides a new perspective on unsupervised RL. 2. The paper is technically sound, and the proposed method is well-motivated and clearly explained. The authors provide a thorough theoretical analysis of their method, including a connection to principal component analysis (PCA). 3. The paper is we

Weaknesses

1. While the paper presents results on a variety of environments, it would be beneficial to see how METRA performs on more complex environments such as Atari[1] or Google Research Football[2]. This would provide a more comprehensive evaluation of the method's scalability and effectiveness. 2. The paper could benefit from a comparison with more diversity RL baselines, such as RSPO[3] and DGPO[4]. This would provide a more complete picture of how METRA compares to other state-of-the-art methods in

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

1. This paper is well-structured. The authors first analyze the common limitations of existing unsupervised RL approaches and then provide solid theoretical and empirical evidence to show why and how the proposed method works, making this paper understandable. 2. Experiments are well-described and highly reproducible. Experiments have good coverage. The selection of baselines and environments is reasonable and convincing.

Weaknesses

There are no significant weaknesses in this paper. The theoretical explanations of why choosing WDM as the objective might be a little complicated for readers lacking corresponding background. Some explicit examples or pictures may help.

Code & Models

Repositories

seohongpark/metra
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Reinforcement Learning in Robotics