Micro-power spoken keyword spotting on Xylo Audio 2

Hannah Bos; Dylan R. Muir

arXiv:2406.15112·cs.NE·June 24, 2024·2 cites

Micro-power spoken keyword spotting on Xylo Audio 2

Hannah Bos, Dylan R. Muir

PDF

Open Access

TL;DR

This paper demonstrates a neuromorphic processor implementing a spoken keyword spotting benchmark with high accuracy and extremely low power consumption, showcasing the energy efficiency advantages of neuromorphic designs for edge audio processing.

Contribution

It presents the first implementation of the Aloha keyword spotting benchmark on the Xylo Audio 2 neuromorphic processor, achieving state-of-the-art energy efficiency and accuracy.

Findings

01

95% task accuracy on Aloha benchmark

02

Best-in-class inference power of 291μW

03

Inference efficiency of 6.6μJ per inference

Abstract

For many years, designs for "Neuromorphic" or brain-like processors have been motivated by achieving extreme energy efficiency, compared with von-Neumann and tensor processor devices. As part of their design language, Neuromorphic processors take advantage of weight, parameter, state and activity sparsity. In the extreme case, neural networks based on these principles mimic the sparse activity oof biological nervous systems, in ``Spiking Neural Networks'' (SNNs). Few benchmarks are available for Neuromorphic processors, that have been implemented for a range of Neuromorphic and non-Neuromorphic platforms, which can therefore demonstrate the energy benefits of Neuromorphic processor designs. Here we describes the implementation of a spoken audio keyword-spotting (KWS) benchmark "Aloha" on the Xylo Audio 2 (SYNS61210) Neuromorphic processor device. We obtained high deployed quantized task…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis