A Simple and Efficient Baseline for Zero-Shot Generative Classification

Zipeng Qi; Buhua Liu; Shiyan Zhang; Bao Li; Zhiqiang Xu; Haoyi Xiong,; Zeke Xie

arXiv:2412.12594·cs.CV·December 18, 2024

A Simple and Efficient Baseline for Zero-Shot Generative Classification

Zipeng Qi, Buhua Liu, Shiyan Zhang, Bao Li, Zhiqiang Xu, Haoyi Xiong,, Zeke Xie

PDF

Open Access

TL;DR

This paper introduces a simple, fast, and accurate zero-shot diffusion classifier using pretrained models, achieving over 10 points better accuracy and 30,000 times faster speed than previous methods on ImageNet.

Contribution

It proposes the first zero-shot diffusion classifier that combines high accuracy with practical speed, leveraging pretrained text-to-image diffusion models and DINOv2.

Findings

01

Surpasses previous zero-shot diffusion classifiers by over 10 points on ImageNet.

02

Accelerates classification speed by more than 30,000 times.

03

Achieves competitive zero-shot performance across various datasets.

Abstract

Large diffusion models have become mainstream generative models in both academic studies and industrial AIGC applications. Recently, a number of works further explored how to employ the power of large diffusion models as zero-shot classifiers. While recent zero-shot diffusion-based classifiers have made performance advancement on benchmark datasets, they still suffered badly from extremely slow classification speed (e.g., ~1000 seconds per classifying single image on ImageNet). The extremely slow classification speed strongly prohibits existing zero-shot diffusion-based classifiers from practical applications. In this paper, we propose an embarrassingly simple and efficient zero-shot Gaussian Diffusion Classifiers (GDC) via pretrained text-to-image diffusion models and DINOv2. The proposed GDC can not only significantly surpass previous zero-shot diffusion-based classifiers by over 10…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification

MethodsDiffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings