SpoofCeleb: Speech Deepfake Detection and SASV In The Wild
Jee-weon Jung, Yihan Wu, Xin Wang, Ji-Hoon Kim, Soumi Maiti, Yuta, Matsunaga, Hye-jin Shim, Jinchuan Tian, Nicholas Evans, Joon Son Chung,, Wangyou Zhang, Seyun Um, Shinnosuke Takamichi, Shinji Watanabe

TL;DR
SpoofCeleb is a large, real-world dataset for speech deepfake detection and speaker verification robustness, created by transforming VoxCeleb1 data and training multiple TTS systems, enabling more realistic and diverse evaluation.
Contribution
The paper introduces SpoofCeleb, a comprehensive dataset for SDD and SASV, generated from real-world data and multiple TTS systems, addressing limitations of existing datasets.
Findings
Over 2.5 million utterances from 1,251 speakers.
Baseline results provided for SDD and SASV tasks.
Dataset and protocols are publicly available.
Abstract
This paper introduces SpoofCeleb, a dataset designed for Speech Deepfake Detection (SDD) and Spoofing-robust Automatic Speaker Verification (SASV), utilizing source data from real-world conditions and spoofing attacks generated by Text-To-Speech (TTS) systems also trained on the same real-world data. Robust recognition systems require speech data recorded in varied acoustic environments with different levels of noise to be trained. However, current datasets typically include clean, high-quality recordings (bona fide data) due to the requirements for TTS training; studio-quality or well-recorded read speech is typically necessary to train TTS models. Current SDD datasets also have limited usefulness for training SASV models due to insufficient speaker diversity. SpoofCeleb leverages a fully automated pipeline we developed that processes the VoxCeleb1 dataset, transforming it into a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
