Referee: Reference-aware Audiovisual Deepfake Detection

Hyemin Boo; Eunsang Lee; Jiyoung Lee

arXiv:2510.27475·cs.CV·March 16, 2026

Referee: Reference-aware Audiovisual Deepfake Detection

Hyemin Boo, Eunsang Lee, Jiyoung Lee

PDF

Open Access 1 Models

TL;DR

Referee introduces a reference-aware audiovisual deepfake detection method that leverages identity discrepancies and biometric anchors to improve generalization across unseen manipulation techniques, achieving state-of-the-art results.

Contribution

The paper proposes a novel reference-aware approach using identity matching modules, enhancing deepfake detection generalization beyond existing transient artifact-based methods.

Findings

01

Achieves 99.4% AUC on KoDF dataset.

02

Outperforms existing methods on cross-dataset evaluations.

03

Effectively models speaker-specific cues for robust detection.

Abstract

Deepfakes generated by advanced generative models have rapidly posed serious threats, yet existing audiovisual deepfake detection approaches struggle to generalize to unseen manipulation methods. To address this, we propose a novel reference-aware audiovisual deepfake detection method, called Referee to capture fine-grained identity discrepancies. Unlike existing methods that overfit to transient spatiotemporal artifacts, Referee employs identity bottleneck and matching modules to model the relational consistency of speaker-specific cues captured by a single one-shot example as a biometric anchor. Extensive experiments on FakeAVCeleb, FaceForensics++, and KoDF demonstrate that Referee achieves state-of-the-art results on cross-dataset and cross-language evaluation protocols, including a 99.4% AUC on KoDF. These results highlight that explicitly correlating reference-based biometric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
eunsanglee/Referee
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · Speech and Audio Processing