Adversarially Robust AI-Generated Image Detection for Free: An Information Theoretic Perspective
Ruixuan Zhang, He Wang, Zhengyu Zhao, Zhiqing Guo, Xun Yang, Yunfeng Diao, Meng Wang

TL;DR
This paper introduces TRIM, a training-free method for detecting AI-generated images that is robust against adversarial attacks, based on an information-theoretic analysis of feature shifts.
Contribution
The paper presents TRIM, the first training-free adversarial defense for AIGI detection, leveraging information measures to improve robustness without additional training.
Findings
TRIM outperforms state-of-the-art defenses by up to 33.88% on key datasets.
Feature entanglement causes adversarial vulnerability in existing detectors.
Standard detectors exhibit clear feature separation, aiding in robust detection.
Abstract
Rapid advances in Artificial Intelligence Generated Images (AIGI) have facilitated malicious use, such as forgery and misinformation. Therefore, numerous methods have been proposed to detect fake images. Although such detectors have been proven to be universally vulnerable to adversarial attacks, defenses in this field are scarce. In this paper, we first identify that adversarial training (AT), widely regarded as the most effective defense, suffers from performance collapse in AIGI detection. Through an information-theoretic lens, we further attribute the cause of collapse to feature entanglement, which disrupts the preservation of feature-label mutual information. Instead, standard detectors show clear feature separation. Motivated by this difference, we propose Training-free Robust Detection via Information-theoretic Measures (TRIM), the first training-free adversarial defense for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
MethodsWGAN-GP Loss · 1x1 Convolution · HuMan(Expedia)||How do I get a human at Expedia? · Local Response Normalization · Dense Connections · Convolution · Progressively Growing GAN
