Towards Realistic Zero-Shot Classification via Self Structural Semantic Alignment
Sheng Zhang, Muzammal Naseer, Guangyi Chen, Zhiqiang Shen, Salman, Khan, Kun Zhang, Fahad Khan

TL;DR
This paper introduces S^3A, a novel framework for realistic zero-shot classification that leverages structural semantic information from unlabeled data and self-learning to improve accuracy without relying on annotated datasets.
Contribution
The paper proposes the S^3A framework with the CVPR algorithm for extracting and aligning structural semantics, enabling zero-shot classification in open-world scenarios without annotations.
Findings
Achieves over 15% accuracy improvement over CLIP on average.
Effectively extracts structural semantics from unlabeled data.
Demonstrates superior performance across various benchmarks.
Abstract
Large-scale pre-trained Vision Language Models (VLMs) have proven effective for zero-shot classification. Despite the success, most traditional VLMs-based methods are restricted by the assumption of partial source supervision or ideal vocabularies, which rarely satisfy the open-world scenario. In this paper, we aim at a more challenging setting, Realistic Zero-Shot Classification, which assumes no annotation but instead a broad vocabulary. To address this challenge, we propose the Self Structural Semantic Alignment (S^3A) framework, which extracts the structural semantic information from unlabeled data while simultaneously self-learning. Our S^3A framework adopts a unique Cluster-Vote-Prompt-Realign (CVPR) algorithm, which iteratively groups unlabeled data to derive structural semantics for pseudo-supervision. Our CVPR process includes iterative clustering on images, voting within each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI
MethodsContrastive Language-Image Pre-training
