ERF-BA-TFD+: A Multimodal Model for Audio-Visual Deepfake Detection

Xin Zhang; Jiaming Chu; Jian Zhao; Yuchu Jiang; Xu Yang; Lei Jin; Chi Zhang; Xuelong Li

arXiv:2508.17282·cs.AI·December 4, 2025

ERF-BA-TFD+: A Multimodal Model for Audio-Visual Deepfake Detection

Xin Zhang, Jiaming Chu, Jian Zhao, Yuchu Jiang, Xu Yang, Lei Jin, Chi Zhang, Xuelong Li

PDF

TL;DR

ERF-BA-TFD+ is a novel multimodal deepfake detection model that combines enhanced receptive fields and audio-visual fusion to improve accuracy and robustness in detecting manipulated multimedia content across both audio and video modalities.

Contribution

The paper introduces ERF-BA-TFD+, a new model that effectively models long-range dependencies in audio-visual data for deepfake detection, achieving state-of-the-art results.

Findings

01

Achieved state-of-the-art accuracy on DDL-AV dataset.

02

Outperformed existing methods in detection speed.

03

Won first place in the DDL-AV competition.

Abstract

Deepfake detection is a critical task in identifying manipulated multimedia content. In real-world scenarios, deepfake content can manifest across multiple modalities, including audio and video. To address this challenge, we present ERF-BA-TFD+, a novel multimodal deepfake detection model that combines enhanced receptive field (ERF) and audio-visual fusion. Our model processes both audio and video features simultaneously, leveraging their complementary information to improve detection accuracy and robustness. The key innovation of ERF-BA-TFD+ lies in its ability to model long-range dependencies within the audio-visual input, allowing it to better capture subtle discrepancies between real and fake content. In our experiments, we evaluate ERF-BA-TFD+ on the DDL-AV dataset, which consists of both segmented and full-length video clips. Unlike previous benchmarks, which focused primarily on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.