SpoofCeleb: Speech Deepfake Detection and SASV In The Wild

Jee-weon Jung; Yihan Wu; Xin Wang; Ji-Hoon Kim; Soumi Maiti; Yuta; Matsunaga; Hye-jin Shim; Jinchuan Tian; Nicholas Evans; Joon Son Chung,; Wangyou Zhang; Seyun Um; Shinnosuke Takamichi; Shinji Watanabe

arXiv:2409.17285·cs.SD·April 16, 2025

SpoofCeleb: Speech Deepfake Detection and SASV In The Wild

Jee-weon Jung, Yihan Wu, Xin Wang, Ji-Hoon Kim, Soumi Maiti, Yuta, Matsunaga, Hye-jin Shim, Jinchuan Tian, Nicholas Evans, Joon Son Chung,, Wangyou Zhang, Seyun Um, Shinnosuke Takamichi, Shinji Watanabe

PDF

Open Access

TL;DR

SpoofCeleb is a large, real-world dataset for speech deepfake detection and speaker verification robustness, created by transforming VoxCeleb1 data and training multiple TTS systems, enabling more realistic and diverse evaluation.

Contribution

The paper introduces SpoofCeleb, a comprehensive dataset for SDD and SASV, generated from real-world data and multiple TTS systems, addressing limitations of existing datasets.

Findings

01

Over 2.5 million utterances from 1,251 speakers.

02

Baseline results provided for SDD and SASV tasks.

03

Dataset and protocols are publicly available.

Abstract

This paper introduces SpoofCeleb, a dataset designed for Speech Deepfake Detection (SDD) and Spoofing-robust Automatic Speaker Verification (SASV), utilizing source data from real-world conditions and spoofing attacks generated by Text-To-Speech (TTS) systems also trained on the same real-world data. Robust recognition systems require speech data recorded in varied acoustic environments with different levels of noise to be trained. However, current datasets typically include clean, high-quality recordings (bona fide data) due to the requirements for TTS training; studio-quality or well-recorded read speech is typically necessary to train TTS models. Current SDD datasets also have limited usefulness for training SASV models due to insufficient speaker diversity. SpoofCeleb leverages a fully automated pipeline we developed that processes the VoxCeleb1 dataset, transforming it into a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis