Adversarial Threats to DeepFake Detection: A Practical Perspective
Paarth Neekhara, Brian Dolhansky, Joanna Bitton, Cristian Canton, Ferrer

TL;DR
This paper investigates the vulnerabilities of state-of-the-art DeepFake detection methods to practical adversarial attacks, demonstrating that these detectors can be bypassed using transferable and accessible adversarial examples in black-box scenarios.
Contribution
It introduces techniques to enhance the transferability of adversarial attacks against DeepFake detectors and evaluates their effectiveness on leading DeepFake detection models.
Findings
Adversarial attacks can bypass DeepFake detectors in black-box settings.
Transferable adversarial examples significantly improve attack success rates.
Universal adversarial perturbations enable practical and shareable attack methods.
Abstract
Facially manipulated images and videos or DeepFakes can be used maliciously to fuel misinformation or defame individuals. Therefore, detecting DeepFakes is crucial to increase the credibility of social media platforms and other media sharing web sites. State-of-the art DeepFake detection techniques rely on neural network based classification models which are known to be vulnerable to adversarial examples. In this work, we study the vulnerabilities of state-of-the-art DeepFake detection methods from a practical stand point. We perform adversarial attacks on DeepFake detectors in a black box setting where the adversary does not have complete knowledge of the classification models. We study the extent to which adversarial perturbations transfer across different models and propose techniques to improve the transferability of adversarial examples. We also create more accessible attacks using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
