Straight Through Gumbel Softmax Estimator based Bimodal Neural   Architecture Search for Audio-Visual Deepfake Detection

Aravinda Reddy PN; Raghavendra Ramachandra; Krothapalli Sreenivasa; Rao; Pabitra Mitra; Vinod Rathod

arXiv:2406.13384·cs.SD·June 21, 2024·1 cites

Straight Through Gumbel Softmax Estimator based Bimodal Neural Architecture Search for Audio-Visual Deepfake Detection

Aravinda Reddy PN, Raghavendra Ramachandra, Krothapalli Sreenivasa, Rao, Pabitra Mitra, Vinod Rathod

PDF

Open Access

TL;DR

This paper introduces a novel neural architecture search framework using Straight-through Gumbel-Softmax for multimodal deepfake detection, significantly improving fusion model performance and robustness.

Contribution

It proposes a comprehensive two-level search approach for optimizing multimodal fusion architectures specifically for deepfake detection tasks.

Findings

01

Achieved 94.4% AUC on FakeAVCeleb and SWAN-DF datasets.

02

Efficiently identified crucial features from backbone networks.

03

Developed a fusion architecture with minimal model parameters.

Abstract

Deepfakes are a major security risk for biometric authentication. This technology creates realistic fake videos that can impersonate real people, fooling systems that rely on facial features and voice patterns for identification. Existing multimodal deepfake detectors rely on conventional fusion methods, such as majority rule and ensemble voting, which often struggle to adapt to changing data characteristics and complex patterns. In this paper, we introduce the Straight-through Gumbel-Softmax (STGS) framework, offering a comprehensive approach to search multimodal fusion model architectures. Using a two-level search approach, the framework optimizes the network architecture, parameters, and performance. Initially, crucial features were efficiently identified from backbone networks, whereas within the cell structure, a weighted fusion operation integrated information from various…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Image and Signal Denoising Methods · Speech and Audio Processing