Can a MISL Fly? Analysis and Ingredients for Mutual Information Skill Learning

Chongyi Zheng; Jens Tuyls; Joanne Peng; Benjamin Eysenbach

arXiv:2412.08021·cs.LG·October 14, 2025

Can a MISL Fly? Analysis and Ingredients for Mutual Information Skill Learning

Chongyi Zheng, Jens Tuyls, Joanne Peng, Benjamin Eysenbach

PDF

Open Access 1 Repo 1 Video 3 Reviews

TL;DR

This paper analyzes mutual information skill learning (MISL) in reinforcement learning, showing that benefits of Wasserstein-based methods like METRA can be explained within MISL, and proposes a simpler contrastive successor features approach.

Contribution

It demonstrates that the advantages of Wasserstein-based methods can be understood within MISL and introduces a new contrastive successor features method with fewer components.

Findings

01

Contrastive successor features retain METRA's performance

02

Connections established between skill learning, contrastive learning, and successor features

03

Ablation studies reveal key ingredients for effective skill learning

Abstract

Self-supervised learning has the potential of lifting several of the key challenges in reinforcement learning today, such as exploration, representation learning, and reward design. Recent work (METRA) has effectively argued that moving away from mutual information and instead optimizing a certain Wasserstein distance is important for good performance. In this paper, we argue that the benefits seen in that paper can largely be explained within the existing framework of mutual information skill learning (MISL). Our analysis suggests a new MISL method (contrastive successor features) that retains the excellent performance of METRA with fewer moving parts, and highlights connections between skill learning, contrastive representation learning, and successor features. Finally, through careful ablation studies, we provide further insight into some of the key ingredients for both our method…

Peer Reviews

Decision·ICLR 2025 Oral

Reviewer 01Rating 8Confidence 2

Strengths

The paper provides a thorough theoretical analysis of METRA, reinterpreting it within the mutual information skill learning (MISL) framework. This helps demystify the method and connects it to well-established concepts like contrastive learning and information bottlenecks. The presentation is clear and to the point. The writing is excellent, I did not find typos or mistakes. The paper includes extensive empirical evaluations, comparing CSF with existing methods across various tasks. This robus

Weaknesses

While I appreciate that the paper is presented as an improvement on METRA, I'd have enjoyed more a reading that was presenting a new method that is then shown to be equivalent to METRA under certain conditions. Given that the presented method performs are par with METRA, it would also be nice to show where (if anywhere) one fails when the other succeeds. Perhaps partially observed MPDs, more interactive objects or discrete actions spaces would be key in identifying where exactly both methods st

Reviewer 02Rating 8Confidence 3

Strengths

- **[Technical soundness and novelty]** The technical soundness is robust; this work provides a thorough in-depth analysis of the METRA method, finding approximate equivalences with contrastive objectives and the information bottleneck. The analysis leads to a novel method that simplifies METRA, and I found no technical flaws; the method is both novel and solid. - **[Evaluation]** The empirical evaluation effectively validates the hypotheses and theoretical analysis, enhancing the overall pers

Weaknesses

- **[About performances]** A significant question arises especially in the Quadruped experiments, where performance still shows room for improvement compared to METRA. Given that the proposed framework has a similar objective function, fewer hyperparameters, and avoids complex min-max optimization, why does the empirical performance (or at least the rate of convergence) not exceed that of METRA? Any discussion on this would be beneficial. - **[About demonstrations]** It would be advantageous to

Reviewer 03Rating 6Confidence 3

Strengths

The paper is easy to follow and well-written, the analysis is sound and the experiments are relevant.

Weaknesses

The final model is very close to previous work and do not present substantial improvements on the different environment compared to METRA. This is overall acknowledged by the authors.

Code & Models

Repositories

Princeton-RL/contrastive-successor-features
tfOfficial

Videos

Can a MISL Fly? Analysis and Ingredients for Mutual Information Skill Learning· slideslive

Taxonomy

TopicsOpen Education and E-Learning