Generalizable Detection of Audio Deepfakes

Jose A. Lopez; Georg Stemmer; H\'ector Cordourier Maruri

arXiv:2507.01750·eess.AS·July 3, 2025

Generalizable Detection of Audio Deepfakes

Jose A. Lopez, Georg Stemmer, H\'ector Cordourier Maruri

PDF

Open Access

TL;DR

This paper investigates how pre-trained models and data strategies improve the ability of audio deepfake detectors to generalize across diverse datasets, achieving state-of-the-art results.

Contribution

It systematically evaluates various pre-trained backbones, data augmentations, and loss functions to enhance audio deepfake detection generalization.

Findings

01

Substantial improvements in detection accuracy across datasets

02

Outperforms top systems in ASVspoof 5 Challenge

03

Provides insights into model optimization for robustness

Abstract

In this paper, we present our comprehensive study aimed at enhancing the generalization capabilities of audio deepfake detection models. We investigate the performance of various pre-trained backbones, including Wav2Vec2, WavLM, and Whisper, across a diverse set of datasets, including those from the ASVspoof challenges and additional sources. Our experiments focus on the effects of different data augmentation strategies and loss functions on model performance. The results of our research demonstrate substantial enhancements in the generalization capabilities of audio deepfake detection models, surpassing the performance of the top-ranked single system in the ASVspoof 5 Challenge. This study contributes valuable insights into the optimization of audio models for more robust deepfake detection and facilitates future research in this critical area.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · Music and Audio Processing