Spiking Structured State Space Model for Monaural Speech Enhancement
Yu Du, Xu Liu, Yansong Chua

TL;DR
This paper introduces Spiking-S4, a novel speech enhancement model that combines energy-efficient spiking neural networks with long-range sequence modeling, achieving comparable performance to traditional methods but with reduced computational costs.
Contribution
The paper presents Spiking-S4, a new model that merges SNNs with S4 for efficient long-sequence speech enhancement, addressing computational challenges of existing deep learning approaches.
Findings
Spiking-S4 performs comparably to existing ANN methods on benchmark datasets.
Spiking-S4 uses fewer parameters and FLOPs, indicating higher efficiency.
Evaluation confirms the effectiveness of Spiking-S4 in real-world noisy environments.
Abstract
Speech enhancement seeks to extract clean speech from noisy signals. Traditional deep learning methods face two challenges: efficiently using information in long speech sequences and high computational costs. To address these, we introduce the Spiking Structured State Space Model (Spiking-S4). This approach merges the energy efficiency of Spiking Neural Networks (SNN) with the long-range sequence modeling capabilities of Structured State Space Models (S4), offering a compelling solution. Evaluation on the DNS Challenge and VoiceBank+Demand Datasets confirms that Spiking-S4 rivals existing Artificial Neural Network (ANN) methods but with fewer computational resources, as evidenced by reduced parameters and Floating Point Operations (FLOPs).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Hearing Loss and Rehabilitation
