Improving Interpretability and Robustness for the Detection of AI-Generated Images
Tatiana Gaintseva, Laida Kushnareva, German Magai, Irina, Piontkovskaya, Sergey Nikolenko, Martin Benning, Serguei Barannikov, Gregory, Slabaugh

TL;DR
This paper enhances the robustness of AI-generated image detection by analyzing existing methods, proposing embedding and attention head improvements, and introducing a new dataset for better generalization across models.
Contribution
It introduces novel methods to improve detection robustness and provides a new dataset to facilitate future research in AI-generated image detection.
Findings
Increased out-of-distribution classification scores by up to 6%.
Analysis of CLIP-based detection methods reveals interpretability insights.
New dataset supports cross-model generalization evaluation.
Abstract
With growing abilities of generative models, artificial content detection becomes an increasingly important and difficult task. However, all popular approaches to this problem suffer from poor generalization across domains and generative models. In this work, we focus on the robustness of AI-generated image (AIGI) detectors. We analyze existing state-of-the-art AIGI detection methods based on frozen CLIP embeddings and show how to interpret them, shedding light on how images produced by various AI generators differ from real ones. Next we propose two ways to improve robustness: based on removing harmful components of the embedding vector and based on selecting the best performing attention heads in the image encoder model. Our methods increase the mean out-of-distribution (OOD) classification score by up to 6% for cross-model transfer. We also propose a new dataset for AIGI detection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
MethodsSoftmax · Attention Is All You Need · Contrastive Language-Image Pre-training · Focus
