Glitch in the Matrix: A Large Scale Benchmark for Content Driven   Audio-Visual Forgery Detection and Localization

Zhixi Cai; Shreya Ghosh; Abhinav Dhall; Tom Gedeon; Kalin Stefanov,; Munawar Hayat

arXiv:2305.01979·cs.CV·July 18, 2023·2 cites

Glitch in the Matrix: A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization

Zhixi Cai, Shreya Ghosh, Abhinav Dhall, Tom Gedeon, Kalin Stefanov,, Munawar Hayat

PDF

Open Access 1 Repo 1 Models 1 Datasets

TL;DR

This paper introduces a new large-scale benchmark dataset, LAV-DF, for content-driven audio-visual deepfake detection and localization, along with a novel multimodal detection method, BA-TFD+, which outperforms existing approaches.

Contribution

The paper presents the LAV-DF dataset for content-driven audio-visual deepfake detection and localization, and proposes an improved multimodal detection architecture, BA-TFD+, with enhanced accuracy.

Findings

01

BA-TFD+ outperforms baseline in deepfake detection

02

LAV-DF dataset captures content-driven audio-visual manipulations

03

Proposed methods achieve superior localization accuracy

Abstract

Most deepfake detection methods focus on detecting spatial and/or spatio-temporal changes in facial attributes and are centered around the binary classification task of detecting whether a video is real or fake. This is because available benchmark datasets contain mostly visual-only modifications present in the entirety of the video. However, a sophisticated deepfake may include small segments of audio or audio-visual manipulations that can completely change the meaning of the video content. To addresses this gap, we propose and benchmark a new dataset, Localized Audio Visual DeepFake (LAV-DF), consisting of strategic content-driven audio, visual and audio-visual manipulations. The proposed baseline method, Boundary Aware Temporal Forgery Detection (BA-TFD), is a 3D Convolutional Neural Network-based architecture which effectively captures multimodal manipulations. We further improve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ControlNet/LAV-DF
pytorchOfficial

Models

🤗
ControlNet/LAV-DF
model

Datasets

ControlNet/LAV-DF
dataset· 72 dl
72 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Adam · Linear Layer · Label Smoothing · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Residual Connection · Dense Connections