TL;DR
OmniGCD introduces a modality-agnostic approach for generalized category discovery that operates without dataset-specific fine-tuning, leveraging synthetic training and a Transformer-based model to improve zero-shot classification across multiple modalities.
Contribution
It proposes a novel modality-agnostic GCD method trained once on synthetic data, enabling zero-shot discovery across diverse datasets and modalities without fine-tuning.
Findings
Improves classification accuracy for known and novel classes across four modalities.
Performs zero-shot GCD on 16 datasets without dataset-specific fine-tuning.
Highlights the importance of strong encoders and decoupling representation learning from category discovery.
Abstract
Generalized Category Discovery (GCD) challenges methods to identify known and novel classes using partially labeled data, mirroring human category learning. Unlike prior GCD methods, which operate within a single modality and require dataset-specific fine-tuning, we propose a modality-agnostic GCD approach inspired by the human brain's abstract category formation. Our leverages modality-specific encoders (e.g., vision, audio, text, remote sensing) to process inputs, followed by dimension reduction to construct a , which is transformed at test-time into a representation better suited for clustering using a novel synthetically trained Transformer-based model. To evaluate OmniGCD, we introduce a where no dataset-specific fine-tuning is allowed, enabling modality-agnostic category discovery. $\textbf{Trained once…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
