Where the Devil Hides: Deepfake Detectors Can No Longer Be Trusted
Shuaiwei Yuan, Junyu Dong, Yuezun Li

TL;DR
This paper reveals the vulnerability of Deepfake detectors to malicious backdoor attacks via poisoned training data, proposing a stealthy trigger generator and poisoning scenarios to demonstrate the threat.
Contribution
It introduces a novel trigger generator for backdoor attacks on Deepfake detectors and analyzes two poisoning scenarios, highlighting security risks in third-party trained models.
Findings
Backdoors can be effectively injected into Deepfake detectors.
Stealthy triggers are hard to detect and can manipulate detector behavior.
Experiments confirm the practicality and effectiveness of the proposed method.
Abstract
With the advancement of AI generative techniques, Deepfake faces have become incredibly realistic and nearly indistinguishable to the human eye. To counter this, Deepfake detectors have been developed as reliable tools for assessing face authenticity. These detectors are typically developed on Deep Neural Networks (DNNs) and trained using third-party datasets. However, this protocol raises a new security risk that can seriously undermine the trustfulness of Deepfake detectors: Once the third-party data providers insert poisoned (corrupted) data maliciously, Deepfake detectors trained on these datasets will be injected ``backdoors'' that cause abnormal behavior when presented with samples containing specific triggers. This is a practical concern, as third-party providers may distribute or sell these triggers to malicious users, allowing them to manipulate detector performance and escape…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
