The Database and Benchmark for the Source Speaker Tracing Challenge 2024

Ze Li; Yuke Lin; Tian Yao; Hongbin Suo; Pengyuan Zhang; Yanzhen Ren,; Zexin Cai; Hiromitsu Nishizaki; Ming Li

arXiv:2406.04951·eess.AS·October 8, 2024·SLT·2 cites

The Database and Benchmark for the Source Speaker Tracing Challenge 2024

Ze Li, Yuke Lin, Tian Yao, Hongbin Suo, Pengyuan Zhang, Yanzhen Ren,, Zexin Cai, Hiromitsu Nishizaki, Ming Li

PDF

Open Access

TL;DR

The paper introduces the Source Speaker Tracking Challenge 2024, providing a large-scale database and benchmarks for source speaker verification, including new tasks and baseline systems to advance research in speaker verification against voice conversion attacks.

Contribution

It presents a new large-scale database, benchmarks, and baseline systems for source speaker verification, addressing data limitations and methodological constraints in the field.

Findings

01

Generated a large-scale converted speech database with 16 VC methods.

02

Developed baseline systems based on MFA-Conformer architecture.

03

Introduced a conversion method recognition task.

Abstract

Voice conversion (VC) systems can transform audio to mimic another speaker's voice, thereby attacking speaker verification (SV) systems. However, ongoing studies on source speaker verification (SSV) are hindered by limited data availability and methodological constraints. This paper presents the Source Speaker Tracking Challenge (SSTC) on STL 2024, which aims to fill the gap in the database and benchmark for the SSV task. In this study, we generate a large-scale converted speech database with 16 common VC methods and train a batch of baseline systems based on the MFA-Conformer architecture. In addition, we introduced a related task called conversion method recognition, with the aim of assisting the SSV task. We expect SSTC to be a platform for advancing the development of the SSV task and provide further insights into the performance and limitations of current SV systems against VC…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing