Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization
Zhixi Cai, Shreya Ghosh, Abhinav Dhall, Tom Gedeon, Kalin Stefanov,, Munawar Hayat

TL;DR
This paper introduces a new large-scale benchmark dataset, LAV-DF, for content-driven audio-visual deepfake detection and localization, along with a novel multimodal detection method, BA-TFD+, which outperforms existing approaches.
Contribution
The paper presents the LAV-DF dataset for content-driven audio-visual deepfake detection and localization, and proposes an improved multimodal detection architecture, BA-TFD+, with enhanced accuracy.
Findings
BA-TFD+ outperforms baseline in deepfake detection
LAV-DF dataset captures content-driven audio-visual manipulations
Proposed methods achieve superior localization accuracy
Abstract
Most deepfake detection methods focus on detecting spatial and/or spatio-temporal changes in facial attributes and are centered around the binary classification task of detecting whether a video is real or fake. This is because available benchmark datasets contain mostly visual-only modifications present in the entirety of the video. However, a sophisticated deepfake may include small segments of audio or audio-visual manipulations that can completely change the meaning of the video content. To addresses this gap, we propose and benchmark a new dataset, Localized Audio Visual DeepFake (LAV-DF), consisting of strategic content-driven audio, visual and audio-visual manipulations. The proposed baseline method, Boundary Aware Temporal Forgery Detection (BA-TFD), is a 3D Convolutional Neural Network-based architecture which effectively captures multimodal manipulations. We further improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Adam · Linear Layer · Label Smoothing · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Residual Connection · Dense Connections
