Multimodal Generalized Category Discovery
Yuchang Su, Renping Zhou, Siyu Huang, Xingjian Li, Tianyang Wang,, Ziyue Wang, Min Xu

TL;DR
This paper extends Generalized Category Discovery to multimodal data, proposing a new framework that aligns heterogeneous information across modalities, leading to improved classification of known and novel categories.
Contribution
It introduces MM-GCD, a novel multimodal GCD framework that effectively aligns features and outputs across modalities using contrastive learning and distillation.
Findings
Achieves state-of-the-art results on UPMC-Food101 and N24News datasets.
Surpasses previous methods by 11.5% and 4.7% in accuracy.
Addresses the challenge of aligning heterogeneous multimodal information.
Abstract
Generalized Category Discovery (GCD) aims to classify inputs into both known and novel categories, a task crucial for open-world scientific discoveries. However, current GCD methods are limited to unimodal data, overlooking the inherently multimodal nature of most real-world data. In this work, we extend GCD to a multimodal setting, where inputs from different modalities provide richer and complementary information. Through theoretical analysis and empirical validation, we identify that the key challenge in multimodal GCD lies in effectively aligning heterogeneous information across modalities. To address this, we propose MM-GCD, a novel framework that aligns both the feature and output spaces of different modalities using contrastive learning and distillation techniques. MM-GCD achieves new state-of-the-art performance on the UPMC-Food101 and N24News datasets, surpassing previous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Rough Sets and Fuzzy Logic
MethodsContrastive Learning
