Can a MISL Fly? Analysis and Ingredients for Mutual Information Skill Learning
Chongyi Zheng, Jens Tuyls, Joanne Peng, Benjamin Eysenbach

TL;DR
This paper analyzes mutual information skill learning (MISL) in reinforcement learning, showing that benefits of Wasserstein-based methods like METRA can be explained within MISL, and proposes a simpler contrastive successor features approach.
Contribution
It demonstrates that the advantages of Wasserstein-based methods can be understood within MISL and introduces a new contrastive successor features method with fewer components.
Findings
Contrastive successor features retain METRA's performance
Connections established between skill learning, contrastive learning, and successor features
Ablation studies reveal key ingredients for effective skill learning
Abstract
Self-supervised learning has the potential of lifting several of the key challenges in reinforcement learning today, such as exploration, representation learning, and reward design. Recent work (METRA) has effectively argued that moving away from mutual information and instead optimizing a certain Wasserstein distance is important for good performance. In this paper, we argue that the benefits seen in that paper can largely be explained within the existing framework of mutual information skill learning (MISL). Our analysis suggests a new MISL method (contrastive successor features) that retains the excellent performance of METRA with fewer moving parts, and highlights connections between skill learning, contrastive representation learning, and successor features. Finally, through careful ablation studies, we provide further insight into some of the key ingredients for both our method…
Peer Reviews
Decision·ICLR 2025 Oral
The paper provides a thorough theoretical analysis of METRA, reinterpreting it within the mutual information skill learning (MISL) framework. This helps demystify the method and connects it to well-established concepts like contrastive learning and information bottlenecks. The presentation is clear and to the point. The writing is excellent, I did not find typos or mistakes. The paper includes extensive empirical evaluations, comparing CSF with existing methods across various tasks. This robus
While I appreciate that the paper is presented as an improvement on METRA, I'd have enjoyed more a reading that was presenting a new method that is then shown to be equivalent to METRA under certain conditions. Given that the presented method performs are par with METRA, it would also be nice to show where (if anywhere) one fails when the other succeeds. Perhaps partially observed MPDs, more interactive objects or discrete actions spaces would be key in identifying where exactly both methods st
- **[Technical soundness and novelty]** The technical soundness is robust; this work provides a thorough in-depth analysis of the METRA method, finding approximate equivalences with contrastive objectives and the information bottleneck. The analysis leads to a novel method that simplifies METRA, and I found no technical flaws; the method is both novel and solid. - **[Evaluation]** The empirical evaluation effectively validates the hypotheses and theoretical analysis, enhancing the overall pers
- **[About performances]** A significant question arises especially in the Quadruped experiments, where performance still shows room for improvement compared to METRA. Given that the proposed framework has a similar objective function, fewer hyperparameters, and avoids complex min-max optimization, why does the empirical performance (or at least the rate of convergence) not exceed that of METRA? Any discussion on this would be beneficial. - **[About demonstrations]** It would be advantageous to
The paper is easy to follow and well-written, the analysis is sound and the experiments are relevant.
The final model is very close to previous work and do not present substantial improvements on the different environment compared to METRA. This is overall acknowledged by the authors.
Code & Models
Videos
Taxonomy
TopicsOpen Education and E-Learning
