LibriMix: An Open-Source Dataset for Generalizable Speech Separation
Joris Cosentino, Manuel Pariente, Samuele Cornell, Antoine Deleforge,, Emmanuel Vincent

TL;DR
This paper introduces LibriMix, an open-source speech separation dataset based on LibriSpeech, designed to improve model generalization across datasets and noisy conditions, with comprehensive evaluation and new test sets.
Contribution
We created LibriMix as a new dataset to address generalization issues in speech separation, providing diverse conditions and a fair evaluation framework.
Findings
Models trained on LibriMix show smaller generalization errors.
LibriMix improves robustness in noisy and overlapping speech scenarios.
The dataset facilitates more realistic speech separation evaluations.
Abstract
In recent years, wsj0-2mix has become the reference dataset for single-channel speech separation. Most deep learning-based speech separation models today are benchmarked on it. However, recent studies have shown important performance drops when models trained on wsj0-2mix are evaluated on other, similar datasets. To address this generalization issue, we created LibriMix, an open-source alternative to wsj0-2mix, and to its noisy extension, WHAM!. Based on LibriSpeech, LibriMix consists of two- or three-speaker mixtures combined with ambient noise samples from WHAM!. Using Conv-TasNet, we achieve competitive performance on all LibriMix versions. In order to fairly evaluate across datasets, we introduce a third test set based on VCTK for speech and WHAM! for noise. Our experiments show that the generalization error is smaller for models trained with LibriMix than with WHAM!, in both clean…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
