Multi-method Integration with Confidence-based Weighting for Zero-shot   Image Classification

Siqi Yin; Lifan Jiang

arXiv:2405.02155·cs.CV·May 6, 2024

Multi-method Integration with Confidence-based Weighting for Zero-shot Image Classification

Siqi Yin, Lifan Jiang

PDF

Open Access

TL;DR

This paper presents a multi-method zero-shot image classification framework that leverages ChatGPT, DALL-E, CLIP, and DINO with confidence-based weighting to improve accuracy on standard datasets.

Contribution

It introduces a novel integration framework combining multiple models and alignment strategies with adaptive confidence weighting for enhanced zero-shot learning performance.

Findings

01

Achieves over 96% AUROC on CIFAR-10, CIFAR-100, and TinyImageNet.

02

Surpasses 99% AUROC on CIFAR-10.

03

Significantly outperforms single-model approaches.

Abstract

This paper introduces a novel framework for zero-shot learning (ZSL), i.e., to recognize new categories that are unseen during training, by using a multi-model and multi-alignment integration method. Specifically, we propose three strategies to enhance the model's performance to handle ZSL: 1) Utilizing the extensive knowledge of ChatGPT and the powerful image generation capabilities of DALL-E to create reference images that can precisely describe unseen categories and classification boundaries, thereby alleviating the information bottleneck issue; 2) Integrating the results of text-image alignment and image-image alignment from CLIP, along with the image-image alignment results from DINO, to achieve more accurate predictions; 3) Introducing an adaptive weighting mechanism based on confidence levels to aggregate the outcomes from different prediction methods. Experimental results on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing Techniques and Applications · Advanced Image Processing Techniques · Domain Adaptation and Few-Shot Learning

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Multi-Head Attention · Dense Connections · Residual Connection · Softmax · Vision Transformer · self-DIstillation with NO labels · Contrastive Language-Image Pre-training