LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant   Multi-Talker Speech Separation, ASR and Speaker Diarization

Zengrui Jin; Yifan Yang; Mohan Shi; Wei Kang; Xiaoyu Yang; Zengwei; Yao; Fangjun Kuang; Liyong Guo; Lingwei Meng; Long Lin; Yong Xu; Shi-Xiong; Zhang; Daniel Povey

arXiv:2409.00819·cs.SD·September 4, 2024

LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization

Zengrui Jin, Yifan Yang, Mohan Shi, Wei Kang, Xiaoyu Yang, Zengwei, Yao, Fangjun Kuang, Liyong Guo, Lingwei Meng, Long Lin, Yong Xu, Shi-Xiong, Zhang, Daniel Povey

PDF

Open Access

TL;DR

This paper introduces LibriheavyMix, a large-scale dataset for single-channel reverberant multi-talker speech tasks, along with a benchmark pipeline for separation, recognition, and diarization in complex environments.

Contribution

It provides a new extensive dataset and a comprehensive pipeline benchmark for advancing single-channel multi-talker speech processing in reverberant settings.

Findings

01

Dataset effectively supports speech separation, recognition, and diarization research.

02

Benchmark pipeline demonstrates applicability across various speech processing tasks.

03

Evaluations confirm dataset's usefulness in real-world reverberant environments.

Abstract

The evolving speech processing landscape is increasingly focused on complex scenarios like meetings or cocktail parties with multiple simultaneous speakers and far-field conditions. Existing methodologies for addressing these challenges fall into two categories: multi-channel and single-channel solutions. Single-channel approaches, notable for their generality and convenience, do not require specific information about microphone arrays. This paper presents a large-scale far-field overlapping speech dataset, crafted to advance research in speech separation, recognition, and speaker diarization. This dataset is a critical resource for decoding ``Who said What and When'' in multi-talker, reverberant environments, a daunting challenge in the field. Additionally, we introduce a pipeline system encompassing speech separation, recognition, and diarization as a foundational benchmark.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Infant Health and Development