Fragile Reconstruction: Adversarial Vulnerability of Reconstruction-Based Detectors for Diffusion-Generated Images
Haoyang Jiang, Mingyang Yi, Shaolei Zhang, Junxian Cai, Qingbin Liu, Xi Chen, Ju Fan

TL;DR
Reconstruction-based detectors for diffusion-generated images are highly vulnerable to imperceptible adversarial attacks, exposing significant security flaws and challenging their reliability.
Contribution
This paper systematically evaluates the adversarial vulnerabilities of reconstruction-based detectors and demonstrates their limited robustness against attacks and defenses.
Findings
Adversarial attacks significantly reduce detection accuracy.
Attacks transfer across different detectors, enabling black-box attacks.
Standard defenses offer limited mitigation against adversarial perturbations.
Abstract
Recently, detecting AI-generated images produced by diffusion-based models has attracted increasing attention due to their potential threat to safety. Among existing approaches, reconstruction-based methods have emerged as a prominent paradigm for this task. However, we find that such methods exhibit severe security vulnerabilities to adversarial perturbations; that is, by adding imperceptible adversarial perturbations to input images, the detection accuracy of classifiers collapses to near zero. To verify this threat, we present a systematic evaluation of the adversarial robustness of three representative detectors across four diverse generative backbone models. First, we construct adversarial attacks in white-box scenarios, which degrade the performance of all well-trained detectors. Moreover, we find that these attacks demonstrate transferability; specifically, attacks crafted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
