CLIP Embeddings for AI-Generated Image Detection: A Few-Shot Study with Lightweight Classifier

Ziyang Ou

arXiv:2505.10664·cs.CV·May 19, 2025

CLIP Embeddings for AI-Generated Image Detection: A Few-Shot Study with Lightweight Classifier

Ziyang Ou

PDF

Open Access

TL;DR

This study explores the use of CLIP embeddings combined with lightweight classifiers to detect AI-generated images, achieving high accuracy with minimal training data and highlighting challenges with certain image styles.

Contribution

It demonstrates that CLIP embeddings can effectively identify AI-generated images with a simple, few-shot learning approach, revealing new insights into the limitations of current detection methods.

Findings

01

Achieves 95% accuracy on CIFAKE without language reasoning

02

Reaches 85% accuracy with only 20% of data in few-shot setting

03

Certain image types like wide-angle photos and oil paintings are difficult to classify.

Abstract

Verifying the authenticity of AI-generated images presents a growing challenge on social media platforms these days. While vision-language models (VLMs) like CLIP outdo in multimodal representation, their capacity for AI-generated image classification is underexplored due to the absence of such labels during the pre-training process. This work investigates whether CLIP embeddings inherently contain information indicative of AI generation. A proposed pipeline extracts visual embeddings using a frozen CLIP model, feeds its embeddings to lightweight networks, and fine-tunes only the final classifier. Experiments on the public CIFAKE benchmark show the performance reaches 95% accuracy without language reasoning. Few-shot adaptation to curated custom with 20% of the data results in performance to 85%. A closed-source baseline (Gemini-2.0) has the best zero-shot accuracy yet fails on specific…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning

MethodsContrastive Language-Image Pre-training