ShiftySpeech: A Large-Scale Synthetic Speech Dataset with Distribution Shifts

Ashi Garg; Zexin Cai; Lin Zhang; Henry Li Xinyuan; Leibny Paola Garc\'ia-Perera; Kevin Duh; Sanjeev Khudanpur; Matthew Wiesner; Nicholas Andrews

arXiv:2502.05674·eess.AS·May 23, 2025

ShiftySpeech: A Large-Scale Synthetic Speech Dataset with Distribution Shifts

Ashi Garg, Zexin Cai, Lin Zhang, Henry Li Xinyuan, Leibny Paola Garc\'ia-Perera, Kevin Duh, Sanjeev Khudanpur, Matthew Wiesner, Nicholas Andrews

PDF

Open Access 1 Repo 2 Models 1 Datasets

TL;DR

ShiftySpeech is a comprehensive synthetic speech dataset designed to evaluate how well detection models generalize under various realistic distribution shifts, revealing significant performance degradation in current methods.

Contribution

We introduce ShiftySpeech, a large-scale benchmark dataset that systematically covers diverse distribution shifts in synthetic speech for robust evaluation.

Findings

01

Distribution shifts significantly reduce detection accuracy.

02

Current state-of-the-art methods are vulnerable to distribution shifts.

03

Benchmark enables detailed analysis of model robustness.

Abstract

The problem of synthetic speech detection has enjoyed considerable attention, with recent methods achieving low error rates across several established benchmarks. However, to what extent can low error rates on academic benchmarks translate to more realistic conditions? In practice, while the training set is fixed at one point in time, test-time conditions may exhibit distribution shifts relative to the training conditions, such as changes in speaker characteristics, emotional expressiveness, language and acoustic conditions, and the emergence of novel synthesis methods. Although some existing datasets target subsets of these distribution shifts, systematic analysis remains difficult due to inconsistencies between source data and synthesis systems across datasets. This difficulty is further exacerbated by the rapid development of new text-to-speech (TTS) and vocoder systems, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Ashigarg123/ShiftySpeech
pytorchOfficial

Models

Datasets

ash56/ShiftySpeech
dataset· 1.5k dl
1.5k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing

MethodsSparse Evolutionary Training