Fragile Reconstruction: Adversarial Vulnerability of Reconstruction-Based Detectors for Diffusion-Generated Images

Haoyang Jiang; Mingyang Yi; Shaolei Zhang; Junxian Cai; Qingbin Liu; Xi Chen; Ju Fan

arXiv:2604.12781·cs.CV·April 15, 2026

Fragile Reconstruction: Adversarial Vulnerability of Reconstruction-Based Detectors for Diffusion-Generated Images

Haoyang Jiang, Mingyang Yi, Shaolei Zhang, Junxian Cai, Qingbin Liu, Xi Chen, Ju Fan

PDF

TL;DR

Reconstruction-based detectors for diffusion-generated images are highly vulnerable to imperceptible adversarial attacks, exposing significant security flaws and challenging their reliability.

Contribution

This paper systematically evaluates the adversarial vulnerabilities of reconstruction-based detectors and demonstrates their limited robustness against attacks and defenses.

Findings

01

Adversarial attacks significantly reduce detection accuracy.

02

Attacks transfer across different detectors, enabling black-box attacks.

03

Standard defenses offer limited mitigation against adversarial perturbations.

Abstract

Recently, detecting AI-generated images produced by diffusion-based models has attracted increasing attention due to their potential threat to safety. Among existing approaches, reconstruction-based methods have emerged as a prominent paradigm for this task. However, we find that such methods exhibit severe security vulnerabilities to adversarial perturbations; that is, by adding imperceptible adversarial perturbations to input images, the detection accuracy of classifiers collapses to near zero. To verify this threat, we present a systematic evaluation of the adversarial robustness of three representative detectors across four diverse generative backbone models. First, we construct adversarial attacks in white-box scenarios, which degrade the performance of all well-trained detectors. Moreover, we find that these attacks demonstrate transferability; specifically, attacks crafted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.