Is It Possible to Backdoor Face Forgery Detection with Natural Triggers?
Xiaoxuan Han, Songlin Yang, Wei Wang, Ziwen He, Jing Dong

TL;DR
This paper introduces a novel natural trigger backdoor attack on face forgery detection models, demonstrating high attack success and robustness while being less detectable to humans, highlighting new security challenges.
Contribution
It proposes a new analysis-by-synthesis backdoor attack embedding natural triggers in face forgery detection models, evaluated with state-of-the-art generative models and comprehensive experiments.
Findings
Achieves over 99% attack success rate with minimal accuracy drop
Outperforms existing defenses against backdoor attacks
Less detectable to humans in user studies
Abstract
Deep neural networks have significantly improved the performance of face forgery detection models in discriminating Artificial Intelligent Generated Content (AIGC). However, their security is significantly threatened by the injection of triggers during model training (i.e., backdoor attacks). Although existing backdoor defenses and manual data selection can mitigate those using human-eye-sensitive triggers, such as patches or adversarial noises, the more challenging natural backdoor triggers remain insufficiently researched. To further investigate natural triggers, we propose a novel analysis-by-synthesis backdoor attack against face forgery detection models, which embeds natural triggers in the latent space. We thoroughly study such backdoor vulnerability from two perspectives: (1) Model Discrimination (Optimization-Based Trigger): we adopt a substitute detection model and find the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis
MethodsHuMan(Expedia)||How do I get a human at Expedia? · Adaptive Instance Normalization · R1 Regularization · Convolution · Dense Connections · Diffusion · Feedforward Network · StyleGAN
