Statistics-aware Audio-visual Deepfake Detector

Marcella Astrid; Enjie Ghorbel; Djamila Aouada

arXiv:2407.11650·cs.CV·July 18, 2024

Statistics-aware Audio-visual Deepfake Detector

Marcella Astrid, Enjie Ghorbel, Djamila Aouada

PDF

Open Access

TL;DR

This paper introduces a novel audio-visual deepfake detection method that leverages statistical feature loss, waveform-based audio description, and a shallower network to improve accuracy and efficiency over existing approaches.

Contribution

It proposes a statistical feature loss, waveform audio representation, post-processing normalization, and a shallower network to enhance deepfake detection performance and reduce complexity.

Findings

01

Effective detection on DFDC and FakeAVCeleb datasets.

02

Improved discrimination with statistical feature loss.

03

Reduced computational complexity with shallower network.

Abstract

In this paper, we propose an enhanced audio-visual deep detection method. Recent methods in audio-visual deepfake detection mostly assess the synchronization between audio and visual features. Although they have shown promising results, they are based on the maximization/minimization of isolated feature distances without considering feature statistics. Moreover, they rely on cumbersome deep learning architectures and are heavily dependent on empirically fixed hyperparameters. Herein, to overcome these limitations, we propose: (1) a statistical feature loss to enhance the discrimination capability of the model, instead of relying solely on feature distances; (2) using the waveform for describing the audio as a replacement of frequency-based representations; (3) a post-processing normalization of the fakeness score; (4) the use of shallower network for reducing the computational…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Speech and Audio Processing · Image and Signal Denoising Methods