Speech Separation Using an Asynchronous Fully Recurrent Convolutional   Neural Network

Xiaolin Hu (1); Kai Li (1); Weiyi Zhang (1); Yi Luo (2); Jean-Marie; Lemercier (3); Timo Gerkmann (3) ((1) Department of Computer Science and; Technology; Tsinghua University; Beijing; China; (2) Department of Electrical; Engineering; Columbia University; NY; USA; (3) Department of Informatics,; University of Hamburg; Hamburg; Germany)

arXiv:2112.02321·cs.SD·December 7, 2021·22 cites

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

Xiaolin Hu (1), Kai Li (1), Weiyi Zhang (1), Yi Luo (2), Jean-Marie, Lemercier (3), Timo Gerkmann (3) ((1) Department of Computer Science and, Technology, Tsinghua University, Beijing, China, (2) Department of Electrical, Engineering, Columbia University, NY, USA

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces an asynchronous fully recurrent convolutional neural network for speech separation, demonstrating improved accuracy and efficiency over traditional models through a novel stage-updating scheme.

Contribution

The paper proposes a bio-inspired asynchronous updating scheme for FRCNNs, enhancing speech separation performance with fewer parameters and better computational efficiency.

Findings

01

Achieved better speech separation results than traditional synchronous models.

02

Reduced model complexity while maintaining high accuracy.

03

Balanced computational efficiency with state-of-the-art performance.

Abstract

Recent advances in the design of neural network architectures, in particular those specialized in modeling sequences, have provided significant improvements in speech separation performance. In this work, we propose to use a bio-inspired architecture called Fully Recurrent Convolutional Neural Network (FRCNN) to solve the separation task. This model contains bottom-up, top-down and lateral connections to fuse information processed at various time-scales represented by \textit{stages}. In contrast to the traditional approach updating stages in parallel, we propose to first update the stages one by one in the bottom-up direction, then fuse information from adjacent stages simultaneously and finally fuse information from all stages to the bottom stage together. Experiments showed that this asynchronous updating scheme achieved significantly better results with much fewer parameters than…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JusperLee/AFRCNN-For-Speech-Separation
pytorchOfficial

Videos

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network· slideslive

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing