SMS-WSJ: Database, performance measures, and baseline recipe for   multi-channel source separation and recognition

Lukas Drude; Jens Heitkaemper; Christoph Boeddeker; Reinhold; Haeb-Umbach

arXiv:1910.13934·cs.SD·October 31, 2019·58 cites

SMS-WSJ: Database, performance measures, and baseline recipe for multi-channel source separation and recognition

Lukas Drude, Jens Heitkaemper, Christoph Boeddeker, Reinhold, Haeb-Umbach

PDF

Open Access 3 Repos

TL;DR

This paper introduces SMS-WSJ, a comprehensive multi-channel speech database with detailed evaluation protocols, and provides baseline separation methods and performance measures to advance research in multi-speaker source separation and recognition.

Contribution

The paper presents a new multi-channel speech database, detailed evaluation procedures, and baseline algorithms for source separation and recognition tasks.

Findings

01

High variability in spatialization improves robustness of separation algorithms

02

Baseline methods achieve competitive word error rates on the new database

03

Critical assessment of source separation performance measures provided

Abstract

We present a multi-channel database of overlapping speech for training, evaluation, and detailed analysis of source separation and extraction algorithms: SMS-WSJ -- Spatialized Multi-Speaker Wall Street Journal. It consists of artificially mixed speech taken from the WSJ database, but unlike earlier databases we consider all WSJ0+1 utterances and take care of strictly separating the speaker sets present in the training, validation and test sets. When spatializing the data we ensure a high degree of randomness w.r.t. room size, array center and rotation, as well as speaker position. Furthermore, this paper offers a critical assessment of recently proposed measures of source separation performance. Alongside the code to generate the database we provide a source separation baseline and a Kaldi recipe with competitive word error rates to provide common ground for evaluation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Blind Source Separation Techniques

MethodsTest