Chipmunk: A Systolically Scalable 0.9 mm${}^2$, 3.08 Gop/s/mW @ 1.2 mW   Accelerator for Near-Sensor Recurrent Neural Network Inference

Francesco Conti; Lukas Cavigelli; Gianna Paulin; Igor Susmelj; Luca; Benini

arXiv:1711.05734·cs.DC·February 22, 2018

Chipmunk: A Systolically Scalable 0.9 mm${}^2$, 3.08 Gop/s/mW @ 1.2 mW Accelerator for Near-Sensor Recurrent Neural Network Inference

Francesco Conti, Lukas Cavigelli, Gianna Paulin, Igor Susmelj, Luca, Benini

PDF

TL;DR

Chipmunk is a compact, energy-efficient hardware accelerator for RNN inference, enabling real-time voice processing on low-power devices by scaling through systolic arrays.

Contribution

It introduces a small, scalable RNN accelerator architecture capable of high efficiency and real-time processing, suitable for low-power edge devices.

Findings

01

Achieves 3.08 Gop/s/mW efficiency at 1.24 mW peak power.

02

Supports large RNN models through multiple cooperating engines.

03

Enables real-time phoneme extraction with less than 13 mW power consumption.

Abstract

Recurrent neural networks (RNNs) are state-of-the-art in voice awareness/understanding and speech recognition. On-device computation of RNNs on low-power mobile and wearable devices would be key to applications such as zero-latency voice-based human-machine interfaces. Here we present Chipmunk, a small (<1 mm $^{2}$ ) hardware accelerator for Long-Short Term Memory RNNs in UMC 65 nm technology capable to operate at a measured peak efficiency up to 3.08 Gop/s/mW at 1.24 mW peak power. To implement big RNN models without incurring in huge memory transfer overhead, multiple Chipmunk engines can cooperate to form a single systolic array. In this way, the Chipmunk architecture in a 75 tiles configuration can achieve real-time phoneme extraction on a demanding RNN topology proposed by Graves et al., consuming less than 13 mW of average power.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.