TL;DR
This paper presents the first FPGA implementation of a keyword spotting system that combines a neuromorphic auditory sensor and a graph neural network for real-time, low-power audio processing.
Contribution
It introduces an end-to-end FPGA system integrating NAS and GNN, eliminating preprocessing and enabling efficient, real-time keyword detection on edge devices.
Findings
Achieved 87.43% accuracy on Google Speech Commands v2 dataset.
End-to-end latency below 35 microseconds.
Power consumption of 1.12 W.
Abstract
With the rapid growth of mobile robotics and embedded intelligence, there is an increasing demand for efficient on-device data processing on edge platforms. A promising research direction is the use of neuromorphic sensors inspired by human sensory systems, which generate sparse, event-based data encoding changes in the environment. In this work, we present the first end-to-end FPGA implementation of a keyword spotting system that integrates a Neuromorphic Auditory Sensor (NAS) and a graph neural network (GNN) on a single FPGA device, enabling real-time processing of raw audio data. The proposed architecture eliminates conventional signal preprocessing and operates directly on event-based audio streams. Leveraging a compute-near-memory network architecture, the system achieves efficient inference with low latency and low power consumption. Experimental results demonstrate an accuracy of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
