SICRN: Advancing Speech Enhancement through State Space Model and   Inplace Convolution Techniques

Changjiang Zhao; Shulin He; Xueliang Zhang

arXiv:2402.14225·eess.AS·February 23, 2024·1 cites

SICRN: Advancing Speech Enhancement through State Space Model and Inplace Convolution Techniques

Changjiang Zhao, Shulin He, Xueliang Zhang

PDF

Open Access

TL;DR

This paper introduces SICRN, a novel speech enhancement model combining a state space model and inplace convolution, which preserves signal structure and improves efficiency over traditional convolutional recurrent neural networks.

Contribution

The paper proposes SICRN, integrating a dual-path state space model with inplace convolution to enhance frequency and temporal modeling in speech enhancement.

Findings

01

Achieves performance close to state-of-the-art on DNS dataset.

02

Reduces model parameters and computational complexity.

03

Maintains low algorithmic delay for real-time applications.

Abstract

Speech enhancement aims to improve speech quality and intelligibility, especially in noisy environments where background noise degrades speech signals. Currently, deep learning methods achieve great success in speech enhancement, e.g. the representative convolutional recurrent neural network (CRN) and its variants. However, CRN typically employs consecutive downsampling and upsampling convolution for frequency modeling, which destroys the inherent structure of the signal over frequency. Additionally, convolutional layers lacks of temporal modelling abilities. To address these issues, we propose an innovative module combing a State space model and Inplace Convolution (SIC), and to replace the conventional convolution in CRN, called SICRN. Specifically, a dual-path multidimensional State space model captures the global frequencies dependency and long-term temporal dependencies. Meanwhile,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis

MethodsConditional Relation Network · Convolution