A Unified Framework for Modality-Agnostic Deepfakes Detection
Cai Yu, Peng Chen, Jiahe Tian, Jin Liu, Jiao Dai, Xi Wang, Yesheng, Chai, Shan Jia, Siwei Lyu, Jizhong Han

TL;DR
This paper presents a comprehensive, modality-agnostic deepfake detection framework that effectively identifies manipulated audio, video, or cross-modal content, even with missing modalities, by leveraging speech correlation features and dual-label detection.
Contribution
The work introduces a novel modality-agnostic detection framework that handles missing modalities and employs AVSR for cross-modal forgery clues, outperforming existing methods.
Findings
Outperforms state-of-the-art detection methods on three datasets.
Effective in scenarios with missing or manipulated modalities.
Supports independent detection of audio and visual manipulations.
Abstract
As AI-generated content (AIGC) thrives, deepfakes have expanded from single-modality falsification to cross-modal fake content creation, where either audio or visual components can be manipulated. While using two unimodal detectors can detect audio-visual deepfakes, cross-modal forgery clues could be overlooked. Existing multimodal deepfake detection methods typically establish correspondence between the audio and visual modalities for binary real/fake classification, and require the co-occurrence of both modalities. However, in real-world multi-modal applications, missing modality scenarios may occur where either modality is unavailable. In such cases, audio-visual detection methods are less practical than two independent unimodal methods. Consequently, the detector can not always obtain the number or type of manipulated modalities beforehand, necessitating a fake-modality-agnostic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Music and Audio Processing · Generative Adversarial Networks and Image Synthesis
