Multimodal Generalized Category Discovery

Yuchang Su; Renping Zhou; Siyu Huang; Xingjian Li; Tianyang Wang,; Ziyue Wang; Min Xu

arXiv:2409.11624·cs.CV·September 19, 2024

Multimodal Generalized Category Discovery

Yuchang Su, Renping Zhou, Siyu Huang, Xingjian Li, Tianyang Wang,, Ziyue Wang, Min Xu

PDF

Open Access

TL;DR

This paper extends Generalized Category Discovery to multimodal data, proposing a new framework that aligns heterogeneous information across modalities, leading to improved classification of known and novel categories.

Contribution

It introduces MM-GCD, a novel multimodal GCD framework that effectively aligns features and outputs across modalities using contrastive learning and distillation.

Findings

01

Achieves state-of-the-art results on UPMC-Food101 and N24News datasets.

02

Surpasses previous methods by 11.5% and 4.7% in accuracy.

03

Addresses the challenge of aligning heterogeneous multimodal information.

Abstract

Generalized Category Discovery (GCD) aims to classify inputs into both known and novel categories, a task crucial for open-world scientific discoveries. However, current GCD methods are limited to unimodal data, overlooking the inherently multimodal nature of most real-world data. In this work, we extend GCD to a multimodal setting, where inputs from different modalities provide richer and complementary information. Through theoretical analysis and empirical validation, we identify that the key challenge in multimodal GCD lies in effectively aligning heterogeneous information across modalities. To address this, we propose MM-GCD, a novel framework that aligns both the feature and output spaces of different modalities using contrastive learning and distillation techniques. MM-GCD achieves new state-of-the-art performance on the UPMC-Food101 and N24News datasets, surpassing previous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Rough Sets and Fuzzy Logic

MethodsContrastive Learning