A Unified Framework for Modality-Agnostic Deepfakes Detection

Cai Yu; Peng Chen; Jiahe Tian; Jin Liu; Jiao Dai; Xi Wang; Yesheng; Chai; Shan Jia; Siwei Lyu; Jizhong Han

arXiv:2307.14491·cs.MM·October 28, 2024·1 cites

A Unified Framework for Modality-Agnostic Deepfakes Detection

Cai Yu, Peng Chen, Jiahe Tian, Jin Liu, Jiao Dai, Xi Wang, Yesheng, Chai, Shan Jia, Siwei Lyu, Jizhong Han

PDF

Open Access

TL;DR

This paper presents a comprehensive, modality-agnostic deepfake detection framework that effectively identifies manipulated audio, video, or cross-modal content, even with missing modalities, by leveraging speech correlation features and dual-label detection.

Contribution

The work introduces a novel modality-agnostic detection framework that handles missing modalities and employs AVSR for cross-modal forgery clues, outperforming existing methods.

Findings

01

Outperforms state-of-the-art detection methods on three datasets.

02

Effective in scenarios with missing or manipulated modalities.

03

Supports independent detection of audio and visual manipulations.

Abstract

As AI-generated content (AIGC) thrives, deepfakes have expanded from single-modality falsification to cross-modal fake content creation, where either audio or visual components can be manipulated. While using two unimodal detectors can detect audio-visual deepfakes, cross-modal forgery clues could be overlooked. Existing multimodal deepfake detection methods typically establish correspondence between the audio and visual modalities for binary real/fake classification, and require the co-occurrence of both modalities. However, in real-world multi-modal applications, missing modality scenarios may occur where either modality is unavailable. In such cases, audio-visual detection methods are less practical than two independent unimodal methods. Consequently, the detector can not always obtain the number or type of manipulated modalities beforehand, necessitating a fake-modality-agnostic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Music and Audio Processing · Generative Adversarial Networks and Image Synthesis