LibriMix: An Open-Source Dataset for Generalizable Speech Separation

Joris Cosentino; Manuel Pariente; Samuele Cornell; Antoine Deleforge,; Emmanuel Vincent

arXiv:2005.11262·eess.AS·May 25, 2020·183 cites

LibriMix: An Open-Source Dataset for Generalizable Speech Separation

Joris Cosentino, Manuel Pariente, Samuele Cornell, Antoine Deleforge,, Emmanuel Vincent

PDF

Open Access 5 Repos

TL;DR

This paper introduces LibriMix, an open-source speech separation dataset based on LibriSpeech, designed to improve model generalization across datasets and noisy conditions, with comprehensive evaluation and new test sets.

Contribution

We created LibriMix as a new dataset to address generalization issues in speech separation, providing diverse conditions and a fair evaluation framework.

Findings

01

Models trained on LibriMix show smaller generalization errors.

02

LibriMix improves robustness in noisy and overlapping speech scenarios.

03

The dataset facilitates more realistic speech separation evaluations.

Abstract

In recent years, wsj0-2mix has become the reference dataset for single-channel speech separation. Most deep learning-based speech separation models today are benchmarked on it. However, recent studies have shown important performance drops when models trained on wsj0-2mix are evaluated on other, similar datasets. To address this generalization issue, we created LibriMix, an open-source alternative to wsj0-2mix, and to its noisy extension, WHAM!. Based on LibriSpeech, LibriMix consists of two- or three-speaker mixtures combined with ambient noise samples from WHAM!. Using Conv-TasNet, we achieve competitive performance on all LibriMix versions. In order to fairly evaluate across datasets, we introduce a third test set based on VCTK for speech and WHAM! for noise. Our experiments show that the generalization error is smaller for models trained with LibriMix than with WHAM!, in both clean…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing