Toward Noise-Aware Audio Deepfake Detection: Survey, SNR-Benchmarks, and Practical Recipes

Udayon Sen; Alka Luqman; Anupam Chattopadhyay

arXiv:2512.13744·cs.SD·December 17, 2025

Toward Noise-Aware Audio Deepfake Detection: Survey, SNR-Benchmarks, and Practical Recipes

Udayon Sen, Alka Luqman, Anupam Chattopadhyay

PDF

Open Access

TL;DR

This paper surveys and evaluates the robustness of state-of-the-art audio deepfake detection models under noisy conditions, introducing benchmarks and practical training strategies to improve real-world performance.

Contribution

It provides a reproducible framework for noise evaluation using SNR benchmarks and analyzes the impact of multi-condition training on detection accuracy.

Findings

01

Finetuning reduces EER by 10-15 percentage points at 10-0 dB SNR.

02

Performance degrades gracefully with increasing noise levels.

03

Benchmarking under controlled SNR conditions reveals robustness gaps in current models.

Abstract

Deepfake audio detection has progressed rapidly with strong pre-trained encoders (e.g., WavLM, Wav2Vec2, MMS). However, performance in realistic capture conditions - background noise (domestic/office/transport), room reverberation, and consumer channels - often lags clean-lab results. We survey and evaluate robustness for state-of-the-art audio deepfake detection models and present a reproducible framework that mixes MS-SNSD noises with ASVspoof 2021 DF utterances to evaluate under controlled signal-to-noise ratios (SNRs). SNR is a measured proxy for noise severity used widely in speech; it lets us sweep from near-clean (35 dB) to very noisy (-5 dB) to quantify graceful degradation. We study multi-condition training and fixed-SNR testing for pretrained encoders (WavLM, Wav2Vec2, MMS), reporting accuracy, ROC-AUC, and EER on binary and four-class (authenticity x corruption) tasks. In our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Generative Adversarial Networks and Image Synthesis · Speech Recognition and Synthesis