GraphVL: Graph-Enhanced Semantic Modeling via Vision-Language Models for   Generalized Class Discovery

Bhupendra Solanki; Ashwin Nair; Mainak Singha; Souradeep Mukhopadhyay,; Ankit Jha; Biplab Banerjee

arXiv:2411.02074·cs.CV·November 19, 2024

GraphVL: Graph-Enhanced Semantic Modeling via Vision-Language Models for Generalized Class Discovery

Bhupendra Solanki, Ashwin Nair, Mainak Singha, Souradeep Mukhopadhyay,, Ankit Jha, Biplab Banerjee

PDF

Open Access

TL;DR

GraphVL introduces a novel vision-language approach combining graph convolutional networks and CLIP to improve generalized category discovery, effectively clustering known and unknown classes with enhanced semantic understanding.

Contribution

The paper presents GraphVL, a new method integrating GCN and CLIP for better feature transfer and semantic modeling in generalized category discovery tasks.

Findings

01

Outperforms existing methods on seven benchmark datasets.

02

Effectively clusters known and unknown categories.

03

Reduces model bias towards known classes.

Abstract

Generalized Category Discovery (GCD) aims to cluster unlabeled images into known and novel categories using labeled images from known classes. To address the challenge of transferring features from known to unknown classes while mitigating model bias, we introduce GraphVL, a novel approach for vision-language modeling in GCD, leveraging CLIP. Our method integrates a graph convolutional network (GCN) with CLIP's text encoder to preserve class neighborhood structure. We also employ a lightweight visual projector for image data, ensuring discriminative features through margin-based contrastive losses for image-text mapping. This neighborhood preservation criterion effectively regulates the semantic space, making it less sensitive to known classes. Additionally, we learn textual prompts from known classes and align them to create a more contextually meaningful semantic feature space for the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Semantic Web and Ontologies

MethodsALIGN · Contrastive Language-Image Pre-training · Graph Convolutional Network