Contextual Cross-Modal Attention for Audio-Visual Deepfake Detection and   Localization

Vinaya Sree Katamneni; Ajita Rattani

arXiv:2408.01532·cs.SD·August 8, 2024·2 cites

Contextual Cross-Modal Attention for Audio-Visual Deepfake Detection and Localization

Vinaya Sree Katamneni, Ajita Rattani

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel RNN-based multi-modal attention framework that effectively detects and localizes audio-visual deepfakes by leveraging contextual information, outperforming existing methods on multiple datasets.

Contribution

The paper presents a new multi-modal attention approach using RNNs for improved audio-visual deepfake detection and localization, addressing the modality gap challenge.

Findings

01

Achieved 3.47% higher accuracy in deepfake detection

02

Improved localization precision by 2.05%

03

Demonstrated superior performance on multiple datasets

Abstract

In the digital age, the emergence of deepfakes and synthetic media presents a significant threat to societal and political integrity. Deepfakes based on multi-modal manipulation, such as audio-visual, are more realistic and pose a greater threat. Current multi-modal deepfake detectors are often based on the attention-based fusion of heterogeneous data streams from multiple modalities. However, the heterogeneous nature of the data (such as audio and visual signals) creates a distributional modality gap and poses a significant challenge in effective fusion and hence multi-modal deepfake detection. In this paper, we propose a novel multi-modal attention framework based on recurrent neural networks (RNNs) that leverages contextual information for audio-visual deepfake detection. The proposed approach applies attention to multi-modal multi-sequence representations and learns the contributing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vcbsl/audio-visual-deepfake
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Image and Signal Denoising Methods · Advanced Image Processing Techniques

MethodsSoftmax · Attention Is All You Need