A comprehensive study of speech separation: spectrogram vs waveform   separation

Fahimeh Bahmaninezhad; Jian Wu; Rongzhi Gu; Shi-Xiong Zhang; Yong Xu,; Meng Yu; and Dong Yu

arXiv:1905.07497·cs.SD·July 25, 2019·1 cites

A comprehensive study of speech separation: spectrogram vs waveform separation

Fahimeh Bahmaninezhad, Jian Wu, Rongzhi Gu, Shi-Xiong Zhang, Yong Xu,, Meng Yu, and Dong Yu

PDF

Open Access

TL;DR

This paper compares spectrogram and waveform speech separation methods, introduces a frequency-domain optimization approach, and demonstrates significant improvements in multi-channel scenarios for naturalistic audio data.

Contribution

It integrates TasNet components into frequency-domain methods, proposes direct optimization of separation criteria, and develops multi-channel solutions utilizing spectral, spatial, and speaker information.

Findings

01

Spectrogram separation achieves competitive performance with improved network design.

02

Multi-channel approaches significantly reduce word error rate and improve SDR.

03

The proposed methods outperform single-channel baselines in reverberant, naturalistic audio simulations.

Abstract

Speech separation has been studied widely for single-channel close-talk microphone recordings over the past few years; developed solutions are mostly in frequency-domain. Recently, a raw audio waveform separation network (TasNet) is introduced for single-channel data, with achieving high Si-SNR (scale-invariant source-to-noise ratio) and SDR (source-to-distortion ratio) comparing against the state-of-the-art solution in frequency-domain. In this study, we incorporate effective components of the TasNet into a frequency-domain separation method. We compare both for alternative scenarios. We introduce a solution for directly optimizing the separation criterion in frequency-domain networks. In addition to speech separation objective and subjective measurements, we evaluate the separation performance on a speech recognition task as well. We study the speech separation problem for far-field…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Blind Source Separation Techniques