A comprehensive study of speech separation: spectrogram vs waveform separation
Fahimeh Bahmaninezhad, Jian Wu, Rongzhi Gu, Shi-Xiong Zhang, Yong Xu,, Meng Yu, and Dong Yu

TL;DR
This paper compares spectrogram and waveform speech separation methods, introduces a frequency-domain optimization approach, and demonstrates significant improvements in multi-channel scenarios for naturalistic audio data.
Contribution
It integrates TasNet components into frequency-domain methods, proposes direct optimization of separation criteria, and develops multi-channel solutions utilizing spectral, spatial, and speaker information.
Findings
Spectrogram separation achieves competitive performance with improved network design.
Multi-channel approaches significantly reduce word error rate and improve SDR.
The proposed methods outperform single-channel baselines in reverberant, naturalistic audio simulations.
Abstract
Speech separation has been studied widely for single-channel close-talk microphone recordings over the past few years; developed solutions are mostly in frequency-domain. Recently, a raw audio waveform separation network (TasNet) is introduced for single-channel data, with achieving high Si-SNR (scale-invariant source-to-noise ratio) and SDR (source-to-distortion ratio) comparing against the state-of-the-art solution in frequency-domain. In this study, we incorporate effective components of the TasNet into a frequency-domain separation method. We compare both for alternative scenarios. We introduce a solution for directly optimizing the separation criterion in frequency-domain networks. In addition to speech separation objective and subjective measurements, we evaluate the separation performance on a speech recognition task as well. We study the speech separation problem for far-field…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Blind Source Separation Techniques
