A Sanity Check for AI-generated Image Detection
Shilin Yan, Ouxiang Li, Jiayin Cai, Yanbin Hao, Xiaolong Jiang, Yao, Hu, Weidi Xie

TL;DR
This paper evaluates the effectiveness of existing AI-generated image detectors, introduces a new challenging dataset called Chameleon, and proposes AIDE, a hybrid feature-based detector that improves detection accuracy but the problem remains unsolved.
Contribution
The paper presents Chameleon, a challenging dataset for AI-generated image detection, and introduces AIDE, a hybrid model combining high-level semantics and low-level artifacts for improved detection.
Findings
Most existing detectors fail on the Chameleon dataset.
AIDE outperforms state-of-the-art methods by 3.5% and 4.6%.
AI-generated image detection remains a challenging problem.
Abstract
With the rapid development of generative models, discerning AI-generated content has evoked increasing attention from both industry and academia. In this paper, we conduct a sanity check on "whether the task of AI-generated image detection has been solved". To start with, we present Chameleon dataset, consisting AIgenerated images that are genuinely challenging for human perception. To quantify the generalization of existing methods, we evaluate 9 off-the-shelf AI-generated image detectors on Chameleon dataset. Upon analysis, almost all models classify AI-generated images as real ones. Later, we propose AIDE (AI-generated Image DEtector with Hybrid Features), which leverages multiple experts to simultaneously extract visual artifacts and noise patterns. Specifically, to capture the high-level semantics, we utilize CLIP to compute the visual embedding. This effectively enables the model…
Peer Reviews
Decision·ICLR 2025 Poster
1.A new datasets and evaluations to show the real status of AI-generated image detection are significant. This work shows that this task is far from solved. 2.The proposed Chameleon dataset is of high-quality and from real-world distributions/sources. 3.The proposed method is straightforward and effective. It combines both high-level semantic features and low-level pixel features. 4.The experiments show effectiveness of the method and the hardness of the new dataset. 5.This paper is easy to foll
1.The experiments are mainly about the effectiveness of the proposed detection model. Some validation of the new dataset in its visual quality perspective will make the work stronger. E.g., by human studies and comparisons between datasets. 2.In Table 6, it is better to also list the accuracy numbers under no perturbations for convenient comparison. 3.More recent related work can be compared or discussed.
This paper is relatively well-motivated as AI-generated image detection is a crucial issue. A new dataset named Chameleon is proposed for detecting AI-generated images. I also find the evaluations thorough. The target issues of the paper are meaningful and worth exploring. The motivation is clear. The paper is easy to follow.
1.Over the past few years, a multitude of expansive and varied collections of AI-generated imagery have been crafted. Undertaking an in-depth analysis to contrast the newly introduced Chameleon dataset against its peers, considering factors like dataset size, utilized generation techniques, would serve to underscore its unique contributions. 2. It would be good if the authors collect fake images of fake news on the Internet and conduct experiments.
This work's main contribution is that it has proposed a more challenging dataset with diverse categories, with which 9 AI-generated image detectors cannot handle the detection problem well. In the AIGC detection problem, I think it is important to create such a challenging baseline, particularly given the fact that the existing benchmark seems saturated. Besides, a new detector with hybrid features (including frequency and semantics) is developed, and it shows improved performance on the AIGCDe
- This dataset only contains 4 categories, some of which only with hundreds of fake photos, this may not effectively cover the detection of a wide range of photos. - A recent work also adopts a mixture of expert architecture for fake image detection [1], which seems not be included as a baseline. It is supposed to discuss the major differences to AIDE, and serve as a possibly stronger baseline. - Regarding JPEG comparisons, only QF=95 and QF=90 are evaluated, more real-world compression such
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Artificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI)
MethodsSoftmax · Attention Is All You Need · Contrastive Language-Image Pre-training
