ShiftySpeech: A Large-Scale Synthetic Speech Dataset with Distribution Shifts
Ashi Garg, Zexin Cai, Lin Zhang, Henry Li Xinyuan, Leibny Paola Garc\'ia-Perera, Kevin Duh, Sanjeev Khudanpur, Matthew Wiesner, Nicholas Andrews

TL;DR
ShiftySpeech is a comprehensive synthetic speech dataset designed to evaluate how well detection models generalize under various realistic distribution shifts, revealing significant performance degradation in current methods.
Contribution
We introduce ShiftySpeech, a large-scale benchmark dataset that systematically covers diverse distribution shifts in synthetic speech for robust evaluation.
Findings
Distribution shifts significantly reduce detection accuracy.
Current state-of-the-art methods are vulnerable to distribution shifts.
Benchmark enables detailed analysis of model robustness.
Abstract
The problem of synthetic speech detection has enjoyed considerable attention, with recent methods achieving low error rates across several established benchmarks. However, to what extent can low error rates on academic benchmarks translate to more realistic conditions? In practice, while the training set is fixed at one point in time, test-time conditions may exhibit distribution shifts relative to the training conditions, such as changes in speaker characteristics, emotional expressiveness, language and acoustic conditions, and the emergence of novel synthesis methods. Although some existing datasets target subsets of these distribution shifts, systematic analysis remains difficult due to inconsistencies between source data and synthesis systems across datasets. This difficulty is further exacerbated by the rapid development of new text-to-speech (TTS) and vocoder systems, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing
MethodsSparse Evolutionary Training
