Stepwise-Refining Speech Separation Network via Fine-Grained Encoding in   High-order Latent Domain

Zengwei Yao; Wenjie Pei; Fanglin Chen; Guangming Lu; and David Zhang

arXiv:2110.04791·eess.AS·February 1, 2022

Stepwise-Refining Speech Separation Network via Fine-Grained Encoding in High-order Latent Domain

Zengwei Yao, Wenjie Pei, Fanglin Chen, Guangming Lu, and David Zhang

PDF

TL;DR

This paper introduces SRSSN, a novel speech separation network that employs a stepwise, coarse-to-fine approach by learning high-order latent domains for more precise separation of mixed speech signals.

Contribution

The paper proposes a stepwise-refining framework that learns multiple latent domains, improving speech separation accuracy over existing single-domain methods.

Findings

01

Effective in clean and noisy environments

02

Improves speech separation accuracy

03

Enhances speech recognition performance

Abstract

The crux of single-channel speech separation is how to encode the mixture of signals into such a latent embedding space that the signals from different speakers can be precisely separated. Existing methods for speech separation either transform the speech signals into frequency domain to perform separation or seek to learn a separable embedding space by constructing a latent domain based on convolutional filters. While the latter type of methods learning an embedding space achieves substantial improvement for speech separation, we argue that the embedding space defined by only one latent domain does not suffice to provide a thoroughly separable encoding space for speech separation. In this paper, we propose the Stepwise-Refining Speech Separation Network (SRSSN), which follows a coarse-to-fine separation framework. It first learns a 1-order latent domain to define an encoding space and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.