From Sharpness to Better Generalization for Speech Deepfake Detection
Wen Huang, Xuechen Liu, Xin Wang, Junichi Yamagishi, Yanmin Qian

TL;DR
This paper explores the role of sharpness as a theoretical measure of generalization in speech deepfake detection, demonstrating that reducing sharpness improves robustness across unseen conditions.
Contribution
It introduces sharpness as a theoretical proxy for generalization in SDD and applies sharpness-aware minimization to enhance model robustness.
Findings
Sharpness increases under domain shifts, indicating higher sensitivity.
Reducing sharpness improves performance on unseen test sets.
Sharpness correlates significantly with generalization in most settings.
Abstract
Generalization remains a critical challenge in speech deepfake detection (SDD). While various approaches aim to improve robustness, generalization is typically assessed through performance metrics like equal error rate without a theoretical framework to explain model performance. This work investigates sharpness as a theoretical proxy for generalization in SDD. We analyze how sharpness responds to domain shifts and find it increases in unseen conditions, indicating higher model sensitivity. Based on this, we apply Sharpness-Aware Minimization (SAM) to reduce sharpness explicitly, leading to better and more stable performance across diverse unseen test sets. Furthermore, correlation analysis confirms a statistically significant relationship between sharpness and generalization in most test settings. These findings suggest that sharpness can serve as a theoretical indicator for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis
MethodsSharpness-Aware Minimization
