Harder or Different? Understanding Generalization of Audio Deepfake Detection
Nicolas M. M\"uller, Nicholas Evans, Hemlata Tak, Philip Sperl,, Konstantin B\"ottinger

TL;DR
This paper investigates why audio deepfake detection models struggle to generalize across different deepfake types, finding that differences between models are the main challenge rather than the increasing difficulty of detection.
Contribution
It decomposes the generalization gap into 'hardness' and 'difference' components, revealing that model differences are the primary obstacle to effective detection.
Findings
Performance gap mainly due to differences between deepfake models
Hardness of detection is negligible across datasets
Increasing model capacity may not improve generalization
Abstract
Recent research has highlighted a key issue in speech deepfake detection: models trained on one set of deepfakes perform poorly on others. The question arises: is this due to the continuously improving quality of Text-to-Speech (TTS) models, i.e., are newer DeepFakes just 'harder' to detect? Or, is it because deepfakes generated with one model are fundamentally different to those generated using another model? We answer this question by decomposing the performance gap between in-domain and out-of-domain test data into 'hardness' and 'difference' components. Experiments performed using ASVspoof databases indicate that the hardness component is practically negligible, with the performance gap being attributed primarily to the difference component. This has direct implications for real-world deepfake detection, highlighting that merely increasing model capacity, the currently-dominant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Digital Media Forensic Detection
MethodsSparse Evolutionary Training
