AVT2-DWF: Improving Deepfake Detection with Audio-Visual Fusion and   Dynamic Weighting Strategies

Rui Wang; Dengpan Ye; Long Tang; Yunming Zhang; Jiacheng Deng

arXiv:2403.14974·cs.CV·March 25, 2024·1 cites

AVT2-DWF: Improving Deepfake Detection with Audio-Visual Fusion and Dynamic Weighting Strategies

Rui Wang, Dengpan Ye, Long Tang, Yunming Zhang, Jiacheng Deng

PDF

Open Access 1 Repo

TL;DR

AVT2-DWF introduces a dual-transformer framework with dynamic weight fusion to effectively detect deepfakes by leveraging both audio and visual cues, achieving state-of-the-art results across multiple datasets.

Contribution

It presents a novel dual-stage audio-visual transformer with dynamic weighting strategies for improved deepfake detection, addressing multi-modal fusion challenges.

Findings

01

Achieves state-of-the-art performance on DeepfakeTIMIT, FakeAVCeleb, and DFDC datasets.

02

Effectively captures spatial and temporal features of facial expressions.

03

Enhances intra- and cross-dataset deepfake detection capabilities.

Abstract

With the continuous improvements of deepfake methods, forgery messages have transitioned from single-modality to multi-modal fusion, posing new challenges for existing forgery detection algorithms. In this paper, we propose AVT2-DWF, the Audio-Visual dual Transformers grounded in Dynamic Weight Fusion, which aims to amplify both intra- and cross-modal forgery cues, thereby enhancing detection capabilities. AVT2-DWF adopts a dual-stage approach to capture both spatial characteristics and temporal dynamics of facial expressions. This is achieved through a face transformer with an n-frame-wise tokenization strategy encoder and an audio transformer encoder. Subsequently, it uses multi-modal conversion with dynamic weight fusion to address the challenge of heterogeneous information fusion between audio and visual modalities. Experiments on DeepfakeTIMIT, FakeAVCeleb, and DFDC datasets…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

raining-dev/avt2-dwf
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection