SMS-WSJ: Database, performance measures, and baseline recipe for multi-channel source separation and recognition
Lukas Drude, Jens Heitkaemper, Christoph Boeddeker, Reinhold, Haeb-Umbach

TL;DR
This paper introduces SMS-WSJ, a comprehensive multi-channel speech database with detailed evaluation protocols, and provides baseline separation methods and performance measures to advance research in multi-speaker source separation and recognition.
Contribution
The paper presents a new multi-channel speech database, detailed evaluation procedures, and baseline algorithms for source separation and recognition tasks.
Findings
High variability in spatialization improves robustness of separation algorithms
Baseline methods achieve competitive word error rates on the new database
Critical assessment of source separation performance measures provided
Abstract
We present a multi-channel database of overlapping speech for training, evaluation, and detailed analysis of source separation and extraction algorithms: SMS-WSJ -- Spatialized Multi-Speaker Wall Street Journal. It consists of artificially mixed speech taken from the WSJ database, but unlike earlier databases we consider all WSJ0+1 utterances and take care of strictly separating the speaker sets present in the training, validation and test sets. When spatializing the data we ensure a high degree of randomness w.r.t. room size, array center and rotation, as well as speaker position. Furthermore, this paper offers a critical assessment of recently proposed measures of source separation performance. Alongside the code to generate the database we provide a source separation baseline and a Kaldi recipe with competitive word error rates to provide common ground for evaluation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Blind Source Separation Techniques
MethodsTest
