Leave No Stone Unturned: Uncovering Holistic Audio-Visual Intrinsic Coherence for Deepfake Detection
Jielun Peng, Yabin Wang, Yaqi Li, Long Kong, and Xiaopeng Hong

TL;DR
This paper introduces HAVIC, a novel deepfake detection method that leverages intrinsic audio-visual coherence and a new high-fidelity dataset, significantly improving detection accuracy across diverse forgeries.
Contribution
HAVIC is the first to jointly model intra- and inter-modal audio-visual coherence for deepfake detection, enhancing robustness and generalization.
Findings
HAVIC outperforms state-of-the-art methods by over 9% in AP and AUC.
The new HiFi-AVDF dataset includes diverse text-to-video and image-to-video forgeries.
Extensive experiments validate HAVIC's superior performance across benchmarks.
Abstract
The rapid progress of generative AI has enabled hyper-realistic audio-visual deepfakes, intensifying threats to personal security and social trust. Most existing deepfake detectors rely either on uni-modal artifacts or audio-visual discrepancies, failing to jointly leverage both sources of information. Moreover, detectors that rely on generator-specific artifacts tend to exhibit degraded generalization when confronted with unseen forgeries. We argue that robust and generalizable detection should be grounded in intrinsic audio-visual coherence within and across modalities. Accordingly, we propose HAVIC, a Holistic Audio-Visual Intrinsic Coherence-based deepfake detector. HAVIC first learns priors of modality-specific structural coherence, inter-modal micro- and macro-coherence by pre-training on authentic videos. Based on the learned priors, HAVIC further performs holistic adaptive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · Image Enhancement Techniques
