Self-Learning for Personalized Keyword Spotting on Ultra-Low-Power Audio   Sensors

Manuele Rusci; Francesco Paci; Marco Fariselli; Eric Flamand; Tinne; Tuytelaars

arXiv:2408.12481·cs.SD·March 10, 2025

Self-Learning for Personalized Keyword Spotting on Ultra-Low-Power Audio Sensors

Manuele Rusci, Francesco Paci, Marco Fariselli, Eric Flamand, Tinne, Tuytelaars

PDF

Open Access 2 Repos

TL;DR

This paper introduces a self-learning approach for personalized keyword spotting on ultra-low-power audio sensors, enabling real-time on-device model fine-tuning without labeled data, with significant accuracy improvements and low energy consumption.

Contribution

It presents a novel self-learning method that incrementally personalizes KWS models on edge devices using pseudo-labels, reducing reliance on labeled data and energy costs.

Findings

01

Accuracy improved by up to +19.2% and +16.0%.

02

Real-time labeling at 8.2 mW power consumption.

03

On-device training energy cost is 10x lower than labeling energy.

Abstract

This paper proposes a self-learning method to incrementally train (fine-tune) a personalized Keyword Spotting (KWS) model after the deployment on ultra-low power smart audio sensors. We address the fundamental problem of the absence of labeled training data by assigning pseudo-labels to the new recorded audio frames based on a similarity score with respect to few user recordings. By experimenting with multiple KWS models with a number of parameters up to 0.5M on two public datasets, we show an accuracy improvement of up to +19.2% and +16.0% vs. the initial models pretrained on a large set of generic keywords. The labeling task is demonstrated on a sensor system composed of a low-power microphone and an energy-efficient Microcontroller (MCU). By efficiently exploiting the heterogeneous processing engines of the MCU, the always-on labeling task runs in real-time with an average power cost…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Advanced Chemical Sensor Technologies

MethodsSparse Evolutionary Training · Self-Learning