Chipmunk: A Systolically Scalable 0.9 mm${}^2$, 3.08 Gop/s/mW @ 1.2 mW Accelerator for Near-Sensor Recurrent Neural Network Inference
Francesco Conti, Lukas Cavigelli, Gianna Paulin, Igor Susmelj, Luca, Benini

TL;DR
Chipmunk is a compact, energy-efficient hardware accelerator for RNN inference, enabling real-time voice processing on low-power devices by scaling through systolic arrays.
Contribution
It introduces a small, scalable RNN accelerator architecture capable of high efficiency and real-time processing, suitable for low-power edge devices.
Findings
Achieves 3.08 Gop/s/mW efficiency at 1.24 mW peak power.
Supports large RNN models through multiple cooperating engines.
Enables real-time phoneme extraction with less than 13 mW power consumption.
Abstract
Recurrent neural networks (RNNs) are state-of-the-art in voice awareness/understanding and speech recognition. On-device computation of RNNs on low-power mobile and wearable devices would be key to applications such as zero-latency voice-based human-machine interfaces. Here we present Chipmunk, a small (<1 mm) hardware accelerator for Long-Short Term Memory RNNs in UMC 65 nm technology capable to operate at a measured peak efficiency up to 3.08 Gop/s/mW at 1.24 mW peak power. To implement big RNN models without incurring in huge memory transfer overhead, multiple Chipmunk engines can cooperate to form a single systolic array. In this way, the Chipmunk architecture in a 75 tiles configuration can achieve real-time phoneme extraction on a demanding RNN topology proposed by Graves et al., consuming less than 13 mW of average power.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
