Referee: Reference-aware Audiovisual Deepfake Detection
Hyemin Boo, Eunsang Lee, Jiyoung Lee

TL;DR
Referee introduces a reference-aware audiovisual deepfake detection method that leverages identity discrepancies and biometric anchors to improve generalization across unseen manipulation techniques, achieving state-of-the-art results.
Contribution
The paper proposes a novel reference-aware approach using identity matching modules, enhancing deepfake detection generalization beyond existing transient artifact-based methods.
Findings
Achieves 99.4% AUC on KoDF dataset.
Outperforms existing methods on cross-dataset evaluations.
Effectively models speaker-specific cues for robust detection.
Abstract
Deepfakes generated by advanced generative models have rapidly posed serious threats, yet existing audiovisual deepfake detection approaches struggle to generalize to unseen manipulation methods. To address this, we propose a novel reference-aware audiovisual deepfake detection method, called Referee to capture fine-grained identity discrepancies. Unlike existing methods that overfit to transient spatiotemporal artifacts, Referee employs identity bottleneck and matching modules to model the relational consistency of speaker-specific cues captured by a single one-shot example as a biometric anchor. Extensive experiments on FakeAVCeleb, FaceForensics++, and KoDF demonstrate that Referee achieves state-of-the-art results on cross-dataset and cross-language evaluation protocols, including a 99.4% AUC on KoDF. These results highlight that explicitly correlating reference-based biometric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · Speech and Audio Processing
