GraphVL: Graph-Enhanced Semantic Modeling via Vision-Language Models for Generalized Class Discovery
Bhupendra Solanki, Ashwin Nair, Mainak Singha, Souradeep Mukhopadhyay,, Ankit Jha, Biplab Banerjee

TL;DR
GraphVL introduces a novel vision-language approach combining graph convolutional networks and CLIP to improve generalized category discovery, effectively clustering known and unknown classes with enhanced semantic understanding.
Contribution
The paper presents GraphVL, a new method integrating GCN and CLIP for better feature transfer and semantic modeling in generalized category discovery tasks.
Findings
Outperforms existing methods on seven benchmark datasets.
Effectively clusters known and unknown categories.
Reduces model bias towards known classes.
Abstract
Generalized Category Discovery (GCD) aims to cluster unlabeled images into known and novel categories using labeled images from known classes. To address the challenge of transferring features from known to unknown classes while mitigating model bias, we introduce GraphVL, a novel approach for vision-language modeling in GCD, leveraging CLIP. Our method integrates a graph convolutional network (GCN) with CLIP's text encoder to preserve class neighborhood structure. We also employ a lightweight visual projector for image data, ensuring discriminative features through margin-based contrastive losses for image-text mapping. This neighborhood preservation criterion effectively regulates the semantic space, making it less sensitive to known classes. Additionally, we learn textual prompts from known classes and align them to create a more contextually meaningful semantic feature space for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Semantic Web and Ontologies
MethodsALIGN · Contrastive Language-Image Pre-training · Graph Convolutional Network
