A Simple and Efficient Baseline for Zero-Shot Generative Classification
Zipeng Qi, Buhua Liu, Shiyan Zhang, Bao Li, Zhiqiang Xu, Haoyi Xiong,, Zeke Xie

TL;DR
This paper introduces a simple, fast, and accurate zero-shot diffusion classifier using pretrained models, achieving over 10 points better accuracy and 30,000 times faster speed than previous methods on ImageNet.
Contribution
It proposes the first zero-shot diffusion classifier that combines high accuracy with practical speed, leveraging pretrained text-to-image diffusion models and DINOv2.
Findings
Surpasses previous zero-shot diffusion classifiers by over 10 points on ImageNet.
Accelerates classification speed by more than 30,000 times.
Achieves competitive zero-shot performance across various datasets.
Abstract
Large diffusion models have become mainstream generative models in both academic studies and industrial AIGC applications. Recently, a number of works further explored how to employ the power of large diffusion models as zero-shot classifiers. While recent zero-shot diffusion-based classifiers have made performance advancement on benchmark datasets, they still suffered badly from extremely slow classification speed (e.g., ~1000 seconds per classifying single image on ImageNet). The extremely slow classification speed strongly prohibits existing zero-shot diffusion-based classifiers from practical applications. In this paper, we propose an embarrassingly simple and efficient zero-shot Gaussian Diffusion Classifiers (GDC) via pretrained text-to-image diffusion models and DINOv2. The proposed GDC can not only significantly surpass previous zero-shot diffusion-based classifiers by over 10…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification
MethodsDiffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
