TL;DR
This paper shows that simple linear classifiers on frozen features from modern vision foundation models outperform complex detectors in real-world AI-generated image detection, especially on challenging in-the-wild datasets.
Contribution
Demonstrates that simple linear classifiers on pre-trained features achieve state-of-the-art generalization in AIGI detection, highlighting the emergent forensic capabilities of foundation models.
Findings
Linear classifiers match specialized detectors on benchmarks.
Outperform detectors by over 30% accuracy on in-the-wild datasets.
Pretraining on synthetic content enables models to internalize forgery concepts.
Abstract
While specialized detectors for AI-Generated Images (AIGI) achieve near-perfect accuracy on curated benchmarks, they suffer from a dramatic performance collapse in realistic, in-the-wild scenarios. In this work, we demonstrate that simplicity prevails over complex architectural designs. A simple linear classifier trained on the frozen features of modern Vision Foundation Models , including Perception Encoder, MetaCLIP 2, and DINOv3, establishes a new state-of-the-art. Through a comprehensive evaluation spanning traditional benchmarks, unseen generators, and challenging in-the-wild distributions, we show that this baseline not only matches specialized detectors on standard benchmarks but also decisively outperforms them on in-the-wild datasets, boosting accuracy by striking margins of over 30\%. We posit that this superior capability is an emergent property driven by the massive scale of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
