SUSD: Structured Unsupervised Skill Discovery through State Factorization

Seyed Mohammad Hadi Hosseini; Mahdieh Soleymani Baghshah

arXiv:2602.01619·cs.LG·February 3, 2026

SUSD: Structured Unsupervised Skill Discovery through State Factorization

Seyed Mohammad Hadi Hosseini, Mahdieh Soleymani Baghshah

PDF

Open Access 3 Reviews

TL;DR

SUSD introduces a structured unsupervised skill discovery framework that factorizes environment states into independent components, enabling richer, more diverse, and disentangled skills for complex tasks.

Contribution

The paper proposes a novel factorized approach to unsupervised skill discovery that allocates distinct skills to environment factors, improving diversity and control.

Findings

01

Outperforms existing methods in diverse environments.

02

Discovers richer, more complex skills without supervision.

03

Enables fine-grained control over individual entities.

Abstract

Unsupervised Skill Discovery (USD) aims to autonomously learn a diverse set of skills without relying on extrinsic rewards. One of the most common USD approaches is to maximize the Mutual Information (MI) between skill latent variables and states. However, MI-based methods tend to favor simple, static skills due to their invariance properties, limiting the discovery of dynamic, task-relevant behaviors. Distance-Maximizing Skill Discovery (DSD) promotes more dynamic skills by leveraging state-space distances, yet still fall short in encouraging comprehensive skill sets that engage all controllable factors or entities in the environment. In this work, we introduce SUSD, a novel framework that harnesses the compositional structure of environments by factorizing the state space into independent components (e.g., objects or controllable entities). SUSD allocates distinct skill variables to…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 5

Strengths

The paper is well written and easy to follow, especially in the method description and the experiment results. The proposed method of improving METRA's controllbility over multiple objects, to the best of my knowledge, is novel in the field of unsupervised skill discovery. The ablations and extra experiment results in the appendix are throughout and provides good insight of each component's importance.

Weaknesses

My major question is the empirical evaluation for the motivation of the paper -- learning skills that engage all controllable factors in the environment. Specifically, in Sec 5.4.1, since the state space is continuous, the # of unique states is infinite, so the policy may still visit many unique states while only covering a small portion of the state space. It's more appropriate to discretize the state space into bins and see the % of bins that are covered by the policy. I notice the authors men

Reviewer 02Rating 6Confidence 4

Strengths

The paper proposed a method with strong empirical results, combining the strengths of skill disentanglement (e.g. DUSDi) and Distance-Maximizing Skill Discovery (e.g. METRA). Ablation studies and factorization sensitivity provide useful insight into component contributions. The factorized embedding formulation is clean and intuitively appealing; it aligns well with factored MDP structure. The curiosity-based factor weighting is a natural and well-motivated extension to encourage balanced skill

Weaknesses

My main concern with this paper is its novelty and conceptual contribution: while the idea of combining state factorization with distance-based skill discovery is sensible, it is not clear how much SUSD goes beyond a straightforward integration of DUSDi, CSD and METRA: - The factorized skill structure is conceptually almost identical to DUSDi, except that it is applied within a DSD objective rather than a mutual-information one. - The curiosity-based weighting resembles the controllability weig

Reviewer 03Rating 6Confidence 4

Strengths

Adds a novel combination of techniques (factored learning to distance-based skill discovery). Provides adequate empirical evidence to support the method handles the load-balancing question for factors in a sufficient way.

Weaknesses

Provides marginal change from existing methods in the unsupervised skill discovery space, since both factorization and skill learning have been tried before. Does not provide clear ablations for which change produces the gain in improvement. Provides a somewhat limited view of skill learning, focused only on unsupervised skill discovery.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Intelligent Tutoring Systems and Adaptive Learning · Reinforcement Learning in Robotics