REAL-M: Towards Speech Separation on Real Mixtures

Cem Subakan; Mirco Ravanelli; Samuele Cornell; Fran\c{c}ois Grondin

arXiv:2110.10812·eess.AS·October 22, 2021

REAL-M: Towards Speech Separation on Real Mixtures

Cem Subakan, Mirco Ravanelli, Samuele Cornell, Fran\c{c}ois Grondin

PDF

Open Access 1 Repo 2 Models

TL;DR

This paper introduces the REAL-M dataset of real-world speech mixtures and proposes a neural estimator for evaluating separation performance without ground truth, demonstrating its reliability and correlation with human judgment.

Contribution

The paper releases a new real-life speech mixture dataset and develops a blind neural SI-SNR estimator for performance evaluation without ground truth.

Findings

01

The SI-SNR estimator reliably evaluates real mixture separation performance.

02

The estimator's predictions correlate well with human opinions.

03

Performance trends on REAL-M match those on synthetic benchmarks.

Abstract

In recent years, deep learning based source separation has achieved impressive results. Most studies, however, still evaluate separation models on synthetic datasets, while the performance of state-of-the-art techniques on in-the-wild speech data remains an open question. This paper contributes to fill this gap in two ways. First, we release the REAL-M dataset, a crowd-sourced corpus of real-life mixtures. Secondly, we address the problem of performance evaluation of real-life mixtures, where the ground truth is not available. We bypass this issue by carefully designing a blind Scale-Invariant Signal-to-Noise Ratio (SI-SNR) neural estimator. Through a user study, we show that our estimator reliably evaluates the separation performance on real mixtures. The performance predictions of the SI-SNR estimator indeed correlate well with human opinions. Moreover, we observe that the performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

speechbrain/speechbrain/tree/develop/recipes/REAL-M/sisnr-estimation
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing