Spiking Structured State Space Model for Monaural Speech Enhancement

Yu Du; Xu Liu; Yansong Chua

arXiv:2309.03641·cs.SD·April 23, 2024

Spiking Structured State Space Model for Monaural Speech Enhancement

Yu Du, Xu Liu, Yansong Chua

PDF

Open Access

TL;DR

This paper introduces Spiking-S4, a novel speech enhancement model that combines energy-efficient spiking neural networks with long-range sequence modeling, achieving comparable performance to traditional methods but with reduced computational costs.

Contribution

The paper presents Spiking-S4, a new model that merges SNNs with S4 for efficient long-sequence speech enhancement, addressing computational challenges of existing deep learning approaches.

Findings

01

Spiking-S4 performs comparably to existing ANN methods on benchmark datasets.

02

Spiking-S4 uses fewer parameters and FLOPs, indicating higher efficiency.

03

Evaluation confirms the effectiveness of Spiking-S4 in real-world noisy environments.

Abstract

Speech enhancement seeks to extract clean speech from noisy signals. Traditional deep learning methods face two challenges: efficiently using information in long speech sequences and high computational costs. To address these, we introduce the Spiking Structured State Space Model (Spiking-S4). This approach merges the energy efficiency of Spiking Neural Networks (SNN) with the long-range sequence modeling capabilities of Structured State Space Models (S4), offering a compelling solution. Evaluation on the DNS Challenge and VoiceBank+Demand Datasets confirms that Spiking-S4 rivals existing Artificial Neural Network (ANN) methods but with fewer computational resources, as evidenced by reduced parameters and Floating Point Operations (FLOPs).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Hearing Loss and Rehabilitation