TL;DR
This paper introduces a universal multiplicative attack that effectively eliminates traces left by generative models in DeepFakes, enabling evasion of attribution methods even with defensive measures, highlighting the need for more robust forensic techniques.
Contribution
The paper presents a novel black-box multiplicative attack method that trains solely on real data to eliminate GMs' traces in DeepFakes, applicable across various models and attribution methods.
Findings
Achieves an average attack success rate of 97.08% against 6 attribution models.
Maintains over 72.39% success rate even with defensive mechanisms.
Demonstrates universal applicability across multiple GMs and attribution techniques.
Abstract
Recent advancements in DeepFakes attribution technologies have significantly enhanced forensic capabilities, enabling the extraction of traces left by generative models (GMs) in images, making DeepFakes traceable back to their source GMs. Meanwhile, several attacks have attempted to evade attribution models (AMs) for exploring their limitations, calling for more robust AMs. However, existing attacks fail to eliminate GMs' traces, thus can be mitigated by defensive measures. In this paper, we identify that untraceable DeepFakes can be achieved through a multiplicative attack, which can fundamentally eliminate GMs' traces, thereby evading AMs even enhanced with defensive measures. We design a universal and black-box attack method that trains an adversarial model solely using real data, applicable for various GMs and agnostic to AMs. Experimental results demonstrate the outstanding attack…
Peer Reviews
Decision·ICLR 2026 Poster
1.[effectiveness] The proposed method achieves better attack performance than the previous methods, with some loss on SSIM and LPIPS.
1.[clarity] In Figure 1, only GAN models are analyzed for the spectral property. Do diffusion models show any similar property? This paper also includes diffusion models apart from GANs, so the diffusion model cannot be missed in this figure. 2.[clarity] This method is effective. However, as a black-box method, its efficiency is also important for discussion. How does the proposed method compare with the previous methods in terms of compute? 3.[ablation] The proposed method balances three para
1. Clear articulation of why additive attacks preserve fingerprints; frequency-domain evidence and defensive degradation back this claim. 2. Simple but clever training using sampling/transform ops to mimic GM artifacts without GM access, which is broad applicability.
Theory makes local (first-order) arguments and per-pixel noise assumptions. Some modern AMs often include non-linear, patchwise, or frequency pipelines and heavy pre-processing, have you tested its performance on such models?
The overall writing quality is clear and well-organized, making the paper easy to follow. The visual demonstrations are also well-designed and effectively support the presented ideas, helping readers better understand the proposed method and its outcomes.
The primary claim of this paper is that existing approaches depend solely on additive perturbations applied to images, which are insufficient to remove the underlying model fingerprints responsible for attribution. To overcome this limitation, the authors propose an adversarial network incorporating multiple constraint loss terms that consider spatial, spectral, and perceptual aspects. However, my main concern is that these constraints appear to focus on controlling visual distortions rather th
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
