TL;DR
SpeechFake is a comprehensive, multilingual dataset with over 3 million deepfake audio samples from 40 synthesis methods, designed to improve detection of speech deepfakes and support robust model development.
Contribution
The paper introduces SpeechFake, a large-scale, diverse speech deepfake dataset with detailed analysis and baseline detection results, addressing limitations of existing datasets.
Findings
Detection models trained on SpeechFake perform well on unseen data.
Generation method, language, and speaker diversity impact detection accuracy.
SpeechFake enhances research in speech deepfake detection and robustness.
Abstract
As speech generation technology advances, the risk of misuse through deepfake audio has become a pressing concern, which underscores the critical need for robust detection systems. However, many existing speech deepfake datasets are limited in scale and diversity, making it challenging to train models that can generalize well to unseen deepfakes. To address these gaps, we introduce SpeechFake, a large-scale dataset designed specifically for speech deepfake detection. SpeechFake includes over 3 million deepfake samples, totaling more than 3,000 hours of audio, generated using 40 different speech synthesis tools. The dataset encompasses a wide range of generation techniques, including text-to-speech, voice conversion, and neural vocoder, incorporating the latest cutting-edge methods. It also provides multilingual support, spanning 46 languages. In this paper, we offer a detailed overview…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
