DPSNN: Spiking Neural Network for Low-Latency Streaming Speech   Enhancement

Tao Sun; Sander Boht\'e

arXiv:2408.07388·cs.SD·August 15, 2024

DPSNN: Spiking Neural Network for Low-Latency Streaming Speech Enhancement

Tao Sun, Sander Boht\'e

PDF

Open Access

TL;DR

This paper introduces DPSNN, a low-latency spiking neural network framework for streaming speech enhancement, combining convolutional and recurrent SNNs to achieve real-time performance with high quality and energy efficiency.

Contribution

The paper proposes a novel two-phase streaming SNN architecture inspired by classical dual-path models, reducing latency to about 5ms for speech enhancement applications.

Findings

01

Achieves approximately 5ms latency suitable for hearing aids

02

Demonstrates high SNR and perceptual quality in speech enhancement

03

Enhances energy efficiency through regularizer-based activation suppression

Abstract

Speech enhancement (SE) improves communication in noisy environments, affecting areas such as automatic speech recognition, hearing aids, and telecommunications. With these domains typically being power-constrained and event-based while requiring low latency, neuromorphic algorithms in the form of spiking neural networks (SNNs) have great potential. Yet, current effective SNN solutions require a contextual sampling window imposing substantial latency, typically around 32ms, too long for many applications. Inspired by Dual-Path Spiking Neural Networks (DPSNNs) in classical neural networks, we develop a two-phase time-domain streaming SNN framework -- the Dual-Path Spiking Neural Network (DPSNN). In the DPSNN, the first phase uses Spiking Convolutional Neural Networks (SCNNs) to capture global contextual information, while the second phase uses Spiking Recurrent Neural Networks (SRNNs) to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Acoustic Wave Phenomena Research

MethodsSpiking Neural Networks · Focus