Boosting keyword spotting through on-device learnable user speech characteristics
Cristian Cioflan, Lukas Cavigelli, Luca Benini

TL;DR
This paper introduces a lightweight on-device learning system for keyword spotting that adapts to individual users' speech characteristics, significantly improving accuracy in TinyML applications with minimal computational resources.
Contribution
It proposes a novel architecture combining a pretrained backbone with user-aware embeddings, enabling effective on-device adaptation with low computational cost.
Findings
Error rate reduced by up to 19% on Google Speech Commands dataset.
Demonstrates effective few-shot learning in scarce data scenarios.
Requires only 23.7k parameters and 1 MFLOP per epoch for training.
Abstract
Keyword spotting systems for always-on TinyML-constrained applications require on-site tuning to boost the accuracy of offline trained classifiers when deployed in unseen inference conditions. Adapting to the speech peculiarities of target users requires many in-domain samples, often unavailable in real-world scenarios. Furthermore, current on-device learning techniques rely on computationally intensive and memory-hungry backbone update schemes, unfit for always-on, battery-powered devices. In this work, we propose a novel on-device learning architecture, composed of a pretrained backbone and a user-aware embedding learning the user's speech characteristics. The so-generated features are fused and used to classify the input utterance. For domain shifts generated by unseen speakers, we measure error rate reductions of up to 19% from 30.1% to 24.3% based on the 35-class problem of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Speech Recognition and Synthesis · Advanced Text Analysis Techniques
