Adversarially Robust AI-Generated Image Detection for Free: An Information Theoretic Perspective

Ruixuan Zhang; He Wang; Zhengyu Zhao; Zhiqing Guo; Xun Yang; Yunfeng Diao; Meng Wang

arXiv:2505.22604·cs.CV·June 2, 2025

Adversarially Robust AI-Generated Image Detection for Free: An Information Theoretic Perspective

Ruixuan Zhang, He Wang, Zhengyu Zhao, Zhiqing Guo, Xun Yang, Yunfeng Diao, Meng Wang

PDF

Open Access

TL;DR

This paper introduces TRIM, a training-free method for detecting AI-generated images that is robust against adversarial attacks, based on an information-theoretic analysis of feature shifts.

Contribution

The paper presents TRIM, the first training-free adversarial defense for AIGI detection, leveraging information measures to improve robustness without additional training.

Findings

01

TRIM outperforms state-of-the-art defenses by up to 33.88% on key datasets.

02

Feature entanglement causes adversarial vulnerability in existing detectors.

03

Standard detectors exhibit clear feature separation, aiding in robust detection.

Abstract

Rapid advances in Artificial Intelligence Generated Images (AIGI) have facilitated malicious use, such as forgery and misinformation. Therefore, numerous methods have been proposed to detect fake images. Although such detectors have been proven to be universally vulnerable to adversarial attacks, defenses in this field are scarce. In this paper, we first identify that adversarial training (AT), widely regarded as the most effective defense, suffers from performance collapse in AIGI detection. Through an information-theoretic lens, we further attribute the cause of collapse to feature entanglement, which disrupts the preservation of feature-label mutual information. Instead, standard detectors show clear feature separation. Motivated by this difference, we propose Training-free Robust Detection via Information-theoretic Measures (TRIM), the first training-free adversarial defense for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning

MethodsWGAN-GP Loss · 1x1 Convolution · HuMan(Expedia)||How do I get a human at Expedia? · Local Response Normalization · Dense Connections · Convolution · Progressively Growing GAN