SICRN: Advancing Speech Enhancement through State Space Model and Inplace Convolution Techniques
Changjiang Zhao, Shulin He, Xueliang Zhang

TL;DR
This paper introduces SICRN, a novel speech enhancement model combining a state space model and inplace convolution, which preserves signal structure and improves efficiency over traditional convolutional recurrent neural networks.
Contribution
The paper proposes SICRN, integrating a dual-path state space model with inplace convolution to enhance frequency and temporal modeling in speech enhancement.
Findings
Achieves performance close to state-of-the-art on DNS dataset.
Reduces model parameters and computational complexity.
Maintains low algorithmic delay for real-time applications.
Abstract
Speech enhancement aims to improve speech quality and intelligibility, especially in noisy environments where background noise degrades speech signals. Currently, deep learning methods achieve great success in speech enhancement, e.g. the representative convolutional recurrent neural network (CRN) and its variants. However, CRN typically employs consecutive downsampling and upsampling convolution for frequency modeling, which destroys the inherent structure of the signal over frequency. Additionally, convolutional layers lacks of temporal modelling abilities. To address these issues, we propose an innovative module combing a State space model and Inplace Convolution (SIC), and to replace the conventional convolution in CRN, called SICRN. Specifically, a dual-path multidimensional State space model captures the global frequencies dependency and long-term temporal dependencies. Meanwhile,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis
MethodsConditional Relation Network · Convolution
