3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and   Multi-Dialect Corpus for Speech Representation Disentanglement

Siqi Zheng; Luyao Cheng; Yafeng Chen; Hui Wang; Qian Chen

arXiv:2306.15354·cs.CL·September 26, 2023·5 cites

3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement

Siqi Zheng, Luyao Cheng, Yafeng Chen, Hui Wang, Qian Chen

PDF

Open Access 2 Repos

TL;DR

The paper introduces 3D-Speaker, a comprehensive large-scale speech corpus with multi-device, multi-distance, and multi-dialect recordings, designed to advance research in speech representation disentanglement and evaluation of speech models.

Contribution

It provides a novel, large-scale multi-dimensional speech dataset that enables disentanglement research and evaluation of universal speech models across diverse conditions.

Findings

01

The corpus includes over 10,000 speakers with multi-device and multi-distance recordings.

02

It facilitates research on speech representation disentanglement and out-of-domain learning.

03

The dataset supports evaluation of self-supervised speech models.

Abstract

Disentangling uncorrelated information in speech utterances is a crucial research topic within speech community. Different speech-related tasks focus on extracting distinct speech representations while minimizing the affects of other uncorrelated information. We present a large-scale speech corpus to facilitate the research of speech representation disentanglement. 3D-Speaker contains over 10,000 speakers, each of whom are simultaneously recorded by multiple Devices, locating at different Distances, and some speakers are speaking multiple Dialects. The controlled combinations of multi-dimensional audio data yield a matrix of a diverse blend of speech representation entanglement, thereby motivating intriguing methods to untangle them. The multi-domain nature of 3D-Speaker also makes it a suitable resource to evaluate large universal speech models and experiment methods of out-of-domain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsFocus