sVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection   with Spiking Neural Networks

Qu Yang; Qianhui Liu; Nan Li; Meng Ge; Zeyang Song; Haizhou Li

arXiv:2403.05772·cs.SD·March 12, 2024·1 cites

sVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks

Qu Yang, Qianhui Liu, Nan Li, Meng Ge, Zeyang Song, Haizhou Li

PDF

Open Access

TL;DR

This paper presents sVAD, a novel spiking neural network-based voice activity detection system that is robust to noise, low-power, and lightweight, suitable for real-world applications.

Contribution

The paper introduces a new SNN-based VAD model with an attention mechanism and demonstrates its effectiveness in noise robustness and low power consumption.

Findings

01

Achieves high noise robustness in VAD tasks

02

Maintains low power consumption and small model size

03

Outperforms existing VAD methods in noisy environments

Abstract

Speech applications are expected to be low-power and robust under noisy conditions. An effective Voice Activity Detection (VAD) front-end lowers the computational need. Spiking Neural Networks (SNNs) are known to be biologically plausible and power-efficient. However, SNN-based VADs have yet to achieve noise robustness and often require large models for high performance. This paper introduces a novel SNN-based VAD model, referred to as sVAD, which features an auditory encoder with an SNN-based attention mechanism. Particularly, it provides effective auditory feature representation through SincNet and 1D convolution, and improves noise robustness with attention mechanisms. The classifier utilizes Spiking Recurrent Neural Networks (sRNN) to exploit temporal speech information. Experimental results demonstrate that our sVAD achieves remarkable noise robustness and meanwhile maintains low…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing