InfoSculpt: Sculpting the Latent Space for Generalized Category Discovery

Wenwen Liao; Hang Ruan; Jianbo Yu; Yuansong Wang; Qingchao Jiang; Xiaofeng Yang

arXiv:2601.10098·cs.CV·January 16, 2026

InfoSculpt: Sculpting the Latent Space for Generalized Category Discovery

Wenwen Liao, Hang Ruan, Jianbo Yu, Yuansong Wang, Qingchao Jiang, Xiaofeng Yang

PDF

Open Access

TL;DR

This paper introduces InfoSculpt, an information-theoretic framework that improves generalized category discovery by disentangling category signals from noise in the latent space, leading to more robust and accurate classification.

Contribution

We propose InfoSculpt, a novel method based on the Information Bottleneck principle that systematically sculpts the latent space for better category discovery in large-scale unlabeled data.

Findings

01

Outperforms existing methods on 8 benchmark datasets.

02

Effectively disentangles category information from instance noise.

03

Produces a more robust and discriminative representation space.

Abstract

Generalized Category Discovery (GCD) aims to classify instances from both known and novel categories within a large-scale unlabeled dataset, a critical yet challenging task for real-world, open-world applications. However, existing methods often rely on pseudo-labeling, or two-stage clustering, which lack a principled mechanism to explicitly disentangle essential, category-defining signals from instance-specific noise. In this paper, we address this fundamental limitation by re-framing GCD from an information-theoretic perspective, grounded in the Information Bottleneck (IB) principle. We introduce InfoSculpt, a novel framework that systematically sculpts the representation space by minimizing a dual Conditional Mutual Information (CMI) objective. InfoSculpt uniquely combines a Category-Level CMI on labeled data to learn compact and discriminative representations for known classes, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Domain Adaptation and Few-Shot Learning · Topic Modeling