SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods

Wen Huang; Yanmei Gu; Zhiming Wang; Huijia Zhu; Yanmin Qian

arXiv:2507.21463·cs.SD·July 30, 2025

SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods

Wen Huang, Yanmei Gu, Zhiming Wang, Huijia Zhu, Yanmin Qian

PDF

1 Video

TL;DR

SpeechFake is a comprehensive, multilingual dataset with over 3 million deepfake audio samples from 40 synthesis methods, designed to improve detection of speech deepfakes and support robust model development.

Contribution

The paper introduces SpeechFake, a large-scale, diverse speech deepfake dataset with detailed analysis and baseline detection results, addressing limitations of existing datasets.

Findings

01

Detection models trained on SpeechFake perform well on unseen data.

02

Generation method, language, and speaker diversity impact detection accuracy.

03

SpeechFake enhances research in speech deepfake detection and robustness.

Abstract

As speech generation technology advances, the risk of misuse through deepfake audio has become a pressing concern, which underscores the critical need for robust detection systems. However, many existing speech deepfake datasets are limited in scale and diversity, making it challenging to train models that can generalize well to unseen deepfakes. To address these gaps, we introduce SpeechFake, a large-scale dataset designed specifically for speech deepfake detection. SpeechFake includes over 3 million deepfake samples, totaling more than 3,000 hours of audio, generated using 40 different speech synthesis tools. The dataset encompasses a wide range of generation techniques, including text-to-speech, voice conversion, and neural vocoder, incorporating the latest cutting-edge methods. It also provides multilingual support, spanning 46 languages. In this paper, we offer a detailed overview…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods· underline