Implementing Keyword Spotting on the MCUX947 Microcontroller with Integrated NPU
Petar Jaku\v{s}, Hrvoje D\v{z}apo

TL;DR
This paper demonstrates a real-time keyword spotting system on a microcontroller with an integrated NPU, achieving high accuracy and significant speedup, suitable for resource-constrained embedded devices.
Contribution
It introduces an optimized CNN-based KWS system with quantization on an embedded microcontroller with NPU, enabling efficient voice recognition on low-power devices.
Findings
59x inference speedup with NPU
97.06% accuracy achieved
Model size of 30.58 KB
Abstract
This paper presents a keyword spotting (KWS) system implemented on the NXP MCXN947 microcontroller with an integrated Neural Processing Unit (NPU), enabling real-time voice interaction on resource-constrained devices. The system combines MFCC feature extraction with a CNN classifier, optimized using Quantization Aware Training to reduce model size with minimal accuracy drop. Experimental results demonstrate a 59x speedup in inference time when leveraging the NPU compared to CPU-only execution, achieving 97.06% accuracy with a model size of 30.58 KB, demonstrating the feasibility of efficient, low-power voice interfaces on embedded platforms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Neural Networks and Applications
MethodsAttentive Walk-Aggregating Graph Neural Network
