Embedding Recurrent Layers with Dual-Path Strategy in a Variant of   Convolutional Network for Speaker-Independent Speech Separation

Xue Yang; Changchun Bao

arXiv:2203.13574·eess.AS·June 17, 2022

Embedding Recurrent Layers with Dual-Path Strategy in a Variant of Convolutional Network for Speaker-Independent Speech Separation

Xue Yang, Changchun Bao

PDF

Open Access

TL;DR

This paper introduces a novel neural network architecture combining RNNs and a variant of convolutional networks with a dual-path strategy, achieving effective speaker-independent speech separation while balancing performance and computational efficiency.

Contribution

It proposes embedding RNNs into a convolutional network variant using a dual-path strategy, enabling better local and global feature learning for speech separation.

Findings

01

Effective separation performance on various datasets

02

Achieves a good balance between accuracy and computational efficiency

03

Gradual separation at multiple scales improves results

Abstract

Speaker-independent speech separation has achieved remarkable performance in recent years with the development of deep neural network (DNN). Various network architectures, from traditional convolutional neural network (CNN) and recurrent neural network (RNN) to advanced transformer, have been designed sophistically to improve separation performance. However, the state-of-the-art models usually suffer from several flaws related to the computation, such as large model size, huge memory consumption and computational complexity. To find the balance between the performance and computational efficiency and to further explore the modeling ability of traditional network structure, we combine RNN and a newly proposed variant of convolutional network to cope with speech separation problem. By embedding two RNNs into basic block of this variant with the help of dual-path strategy, the proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing