An Ultra-low Power RNN Classifier for Always-On Voice Wake-Up Detection   Robust to Real-World Scenarios

Emmanuel Hardy; Franck Badets

arXiv:2103.04792·eess.AS·March 9, 2021·1 cites

An Ultra-low Power RNN Classifier for Always-On Voice Wake-Up Detection Robust to Real-World Scenarios

Emmanuel Hardy, Franck Badets

PDF

Open Access

TL;DR

This paper introduces an ultra-low power RNN-based voice wake-up sensor that is highly robust to real-world noise, significantly reducing power consumption for always-on speech applications while maintaining high accuracy.

Contribution

The authors designed a robust, low-power RNN wake-up sensor trained specifically for real-world noise conditions, outperforming existing methods in power efficiency and noise robustness.

Findings

01

Less than 3% No Trigger Rate in noisy environments

02

Power consumption of 45 nW for the RNN in 65nm CMOS

03

Memory footprint of only 0.52 kB

Abstract

We present in this paper an ultra-low power (ULP) Recurrent Neural Network (RNN) based classifier for an always-on voice Wake-Up Sensor (WUS) with performances suitable for real-world applications. The purpose of our sensor is to bring down by at least a factor 100 the power consumption in background noise of always-on speech processing algorithms such as Automatic Speech Recognition, Keyword Spotting, Speaker Verification, etc. Unlike the other published approaches, we designed our wake-up sensor to be robust to unseen real-world noises for realistic levels of speech and noise by carefully designing the dataset and the loss function. We also specifically trained it to mark only the speech start rather than adopting a traditional Voice Activity Detection (VAD) approach. We achieve less than 3% No Trigger Rate (NTR) for a duty cycle less than 1% in challenging background noises pooled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques