SynVox2: Towards a privacy-friendly VoxCeleb2 dataset

Xiaoxiao Miao; Xin Wang; Erica Cooper; Junichi Yamagishi; Nicholas; Evans; Massimiliano Todisco; Jean-Fran\c{c}ois Bonastre; Mickael Rouvier

arXiv:2309.06141·cs.SD·September 13, 2023

SynVox2: Towards a privacy-friendly VoxCeleb2 dataset

Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi, Nicholas, Evans, Massimiliano Todisco, Jean-Fran\c{c}ois Bonastre, Mickael Rouvier

PDF

Open Access

TL;DR

This paper introduces SynVox2, a synthetic version of the VoxCeleb2 dataset, designed to address privacy and ethical concerns while maintaining utility for speaker recognition tasks.

Contribution

The paper presents a method to generate a privacy-preserving synthetic VoxCeleb2 dataset, enabling ethical and legal use in speaker recognition research.

Findings

01

Synthetic data maintains comparable performance in speaker verification

02

Addresses privacy and legal issues associated with real datasets

03

Highlights challenges in using synthetic data for downstream tasks

Abstract

The success of deep learning in speaker recognition relies heavily on the use of large datasets. However, the data-hungry nature of deep learning methods has already being questioned on account the ethical, privacy, and legal concerns that arise when using large-scale datasets of natural speech collected from real human speakers. For example, the widely-used VoxCeleb2 dataset for speaker recognition is no longer accessible from the official website. To mitigate these concerns, this work presents an initiative to generate a privacy-friendly synthetic VoxCeleb2 dataset that ensures the quality of the generated speech in terms of privacy, utility, and fairness. We also discuss the challenges of using synthetic data for the downstream task of speaker verification.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis