Task Adaptation from Skills: Information Geometry, Disentanglement, and New Objectives for Unsupervised Reinforcement Learning

Yucheng Yang; Tianyi Zhou; Qiang He; Lei Han; Mykola Pechenizkiy; Meng Fang

arXiv:2506.10629·cs.LG·June 13, 2025

Task Adaptation from Skills: Information Geometry, Disentanglement, and New Objectives for Unsupervised Reinforcement Learning

Yucheng Yang, Tianyi Zhou, Qiang He, Lei Han, Mykola Pechenizkiy, Meng Fang

PDF

Open Access 1 Video 3 Reviews

TL;DR

This paper provides a theoretical framework for unsupervised reinforcement learning, emphasizing the importance of skill diversity and separability, and introduces new metrics and objectives based on information geometry and Wasserstein distance to improve skill learning and downstream adaptation.

Contribution

It introduces a new disentanglement metric LSEPIN, connects it with downstream adaptation costs, and proposes Wasserstein-based objectives WSEP and PWSEP for improved skill learning in unsupervised RL.

Findings

01

WSEP helps discover more initial policies for downstream tasks.

02

Wasserstein distance improves geometric properties of skill learning.

03

PWSEP can theoretically find all optimal initial policies.

Abstract

Unsupervised reinforcement learning (URL) aims to learn general skills for unseen downstream tasks. Mutual Information Skill Learning (MISL) addresses URL by maximizing the mutual information between states and skills but lacks sufficient theoretical analysis, e.g., how well its learned skills can initialize a downstream task's policy. Our new theoretical analysis in this paper shows that the diversity and separability of learned skills are fundamentally critical to downstream task adaptation but MISL does not necessarily guarantee these properties. To complement MISL, we propose a novel disentanglement metric LSEPIN. Moreover, we build an information-geometric connection between LSEPIN and downstream task adaptation cost. For better geometric properties, we investigate a new strategy that replaces the KL divergence in information geometry with Wasserstein distance. We extend the…

Peer Reviews

Decision·ICLR 2024 spotlight

Reviewer 01Rating 8· accept, good paperConfidence 4

Strengths

There is a long line of work on unsupervised skill learning based on mutual information maximization between states and skills, most of the work being motivated by intuition and empirical performance. This work complements Eysenbach et al. (2022) by providing a rigorous understanding of the properties of the learned skills and provides useful insights. The analysis presented in this work is novel, to the best of my knowledge and comprises a fairly significant advancement of our understanding of

Weaknesses

This is a fairly strong submission which checks all the boxes. The only minor complaint is that the empirical results in Appendix H should ideally be a part of the main paper, since including them makes the submission more well-rounded and gives empirical validation for the results presented in Section 3.

Reviewer 02Rating 8· accept, good paperConfidence 3

Strengths

The paper investigates an important topic and provides a rigorous analysis of the proposed ideas. In addition, it proposes a practical algorithm that was tested empirically and demonstrates superior results compared to existing MISL methods.

Weaknesses

The paper is hard to read and follow, with lots of details. Since most of the contribution of the paper is placed in the appendix, including all experimental results, it is hard to understand and assess its contribution without reading carefully the appendix.

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

* The geometry perspective of task adaptation is interesting. * Studies on the geometry is promising and can motivate readers. * The theoretical results are well justified.

Weaknesses

* Some Definitions of words are not defined well (diversity and separability). * The flow of the paper is weakly organized and hurts readability (the first topic is LSEPIN and the second WSEP.) The paper have no discuss on other perspectives. * The findings from the theoretical derivation are not surprising. WAC and adaptation cost. * Most contents have high dependency with appendix. Although the authors studied various components, they are not well organized.

Videos

Task Adaptation from Skills: Information Geometry, Disentanglement, and New Objectives for Unsupervised Reinforcement Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques

MethodsUmbrella Reinforcement Learning