Improving Speech Enhancement by Cross- and Sub-band Processing with   State Space Model

Jizhen Li; Weiping Tu; Yuhong Yang; Xinmeng Xu; Yiqun Zhang; Yanzhen; Ren

arXiv:2502.16207·cs.SD·February 25, 2025

Improving Speech Enhancement by Cross- and Sub-band Processing with State Space Model

Jizhen Li, Weiping Tu, Yuhong Yang, Xinmeng Xu, Yiqun Zhang, Yanzhen, Ren

PDF

Open Access

TL;DR

This paper introduces CSMamba, a novel speech enhancement method that employs cross- and sub-band processing with a state space model, improving high-frequency detail restoration and overall performance.

Contribution

The paper proposes a band split block and spectrum restoration block to enhance SSM's handling of sub-band features and high-frequency information in speech enhancement.

Findings

01

Outperforms SOTA methods on DNS Challenge 2021 dataset

02

Uses fewer parameters than competing approaches

03

Achieves superior objective evaluation metrics

Abstract

Recently, the state space model (SSM) represented by Mamba has shown remarkable performance in long-term sequence modeling tasks, including speech enhancement. However, due to substantial differences in sub-band features, applying the same SSM to all sub-bands limits its inference capability. Additionally, when processing each time frame of the time-frequency representation, the SSM may forget certain high-frequency information of low energy, making the restoration of structure in the high-frequency bands challenging. For this reason, we propose Cross- and Sub-band Mamba (CSMamba). To assist the SSM in handling different sub-band features flexibly, we propose a band split block that splits the full-band into four sub-bands with different widths based on their information similarity. We then allocate independent weights to each sub-band, thereby reducing the inference burden on the SSM.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques