Building Corpora for Single-Channel Speech Separation Across Multiple   Domains

Matthew Maciejewski; Gregory Sell; Leibny Paola Garcia-Perera; Shinji; Watanabe; Sanjeev Khudanpur

arXiv:1811.02641·cs.CL·October 30, 2024·6 cites

Building Corpora for Single-Channel Speech Separation Across Multiple Domains

Matthew Maciejewski, Gregory Sell, Leibny Paola Garcia-Perera, Shinji, Watanabe, Sanjeev Khudanpur

PDF

Open Access

TL;DR

This paper develops a method to create realistic synthetic datasets for single-channel speech separation, highlighting the limitations of current models and emphasizing the importance of diverse training data for robustness across different scenarios.

Contribution

It introduces a procedure for building high-quality synthetic overlap datasets from existing corpora, improving the realism and diversity of training data for speech separation models.

Findings

01

Current models underperform on realistic datasets

02

Diverse training data improves model robustness

03

Synthetic datasets can better represent real-world conditions

Abstract

To date, the bulk of research on single-channel speech separation has been conducted using clean, near-field, read speech, which is not representative of many modern applications. In this work, we develop a procedure for constructing high-quality synthetic overlap datasets, necessary for most deep learning-based separation frameworks. We produced datasets that are more representative of realistic applications using the CHiME-5 and Mixer 6 corpora and evaluate standard methods on this data to demonstrate the shortcomings of current source-separation performance. We also demonstrate the value of a wide variety of data in training robust models that generalize well to multiple conditions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques