Self-Learning for Personalized Keyword Spotting on Ultra-Low-Power Audio Sensors
Manuele Rusci, Francesco Paci, Marco Fariselli, Eric Flamand, Tinne, Tuytelaars

TL;DR
This paper introduces a self-learning approach for personalized keyword spotting on ultra-low-power audio sensors, enabling real-time on-device model fine-tuning without labeled data, with significant accuracy improvements and low energy consumption.
Contribution
It presents a novel self-learning method that incrementally personalizes KWS models on edge devices using pseudo-labels, reducing reliance on labeled data and energy costs.
Findings
Accuracy improved by up to +19.2% and +16.0%.
Real-time labeling at 8.2 mW power consumption.
On-device training energy cost is 10x lower than labeling energy.
Abstract
This paper proposes a self-learning method to incrementally train (fine-tune) a personalized Keyword Spotting (KWS) model after the deployment on ultra-low power smart audio sensors. We address the fundamental problem of the absence of labeled training data by assigning pseudo-labels to the new recorded audio frames based on a similarity score with respect to few user recordings. By experimenting with multiple KWS models with a number of parameters up to 0.5M on two public datasets, we show an accuracy improvement of up to +19.2% and +16.0% vs. the initial models pretrained on a large set of generic keywords. The labeling task is demonstrated on a sensor system composed of a low-power microphone and an energy-efficient Microcontroller (MCU). By efficiently exploiting the heterogeneous processing engines of the MCU, the always-on labeling task runs in real-time with an average power cost…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Advanced Chemical Sensor Technologies
MethodsSparse Evolutionary Training · Self-Learning
