Towards Realistic Zero-Shot Classification via Self Structural Semantic   Alignment

Sheng Zhang; Muzammal Naseer; Guangyi Chen; Zhiqiang Shen; Salman; Khan; Kun Zhang; Fahad Khan

arXiv:2308.12960·cs.CV·December 27, 2023·1 cites

Towards Realistic Zero-Shot Classification via Self Structural Semantic Alignment

Sheng Zhang, Muzammal Naseer, Guangyi Chen, Zhiqiang Shen, Salman, Khan, Kun Zhang, Fahad Khan

PDF

Open Access 1 Repo

TL;DR

This paper introduces S^3A, a novel framework for realistic zero-shot classification that leverages structural semantic information from unlabeled data and self-learning to improve accuracy without relying on annotated datasets.

Contribution

The paper proposes the S^3A framework with the CVPR algorithm for extracting and aligning structural semantics, enabling zero-shot classification in open-world scenarios without annotations.

Findings

01

Achieves over 15% accuracy improvement over CLIP on average.

02

Effectively extracts structural semantics from unlabeled data.

03

Demonstrates superior performance across various benchmarks.

Abstract

Large-scale pre-trained Vision Language Models (VLMs) have proven effective for zero-shot classification. Despite the success, most traditional VLMs-based methods are restricted by the assumption of partial source supervision or ideal vocabularies, which rarely satisfy the open-world scenario. In this paper, we aim at a more challenging setting, Realistic Zero-Shot Classification, which assumes no annotation but instead a broad vocabulary. To address this challenge, we propose the Self Structural Semantic Alignment (S^3A) framework, which extracts the structural semantic information from unlabeled data while simultaneously self-learning. Our S^3A framework adopts a unique Cluster-Vote-Prompt-Realign (CVPR) algorithm, which iteratively groups unlabeled data to derive structural semantics for pseudo-supervision. Our CVPR process includes iterative clustering on images, voting within each…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sheng-eatamath/s3a
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI

MethodsContrastive Language-Image Pre-training