Vau da muntanialas: Energy-efficient multi-die scalable acceleration of   RNN inference

Gianna Paulin; Francesco Conti; Lukas Cavigelli; Luca Benini

arXiv:2202.07462·cs.LG·February 16, 2022

Vau da muntanialas: Energy-efficient multi-die scalable acceleration of RNN inference

Gianna Paulin, Francesco Conti, Lukas Cavigelli, Luca Benini

PDF

TL;DR

This paper introduces Muntaniala, an energy-efficient, scalable RNN accelerator architecture that leverages multi-die systolic arrays to improve performance and reduce power consumption in LSTM inference.

Contribution

The paper presents Muntaniala, a novel multi-die RNN accelerator architecture with demonstrated energy efficiency and scalability for large LSTM models, including a multi-chip prototype system.

Findings

01

Achieved 3.25 TOP/s/W energy efficiency in silicon

02

Demonstrated multi-chip array performing LSTM inference in 330μs

03

Reduced system power to 9.0mW at 10MHz

Abstract

Recurrent neural networks such as Long Short-Term Memories (LSTMs) learn temporal dependencies by keeping an internal state, making them ideal for time-series problems such as speech recognition. However, the output-to-input feedback creates distinctive memory bandwidth and scalability challenges in designing accelerators for RNNs. We present Muntaniala, an RNN accelerator architecture for LSTM inference with a silicon-measured energy-efficiency of 3.25 $T O P / s / W$ and performance of 30.53 $GO P / s$ in UMC 65 $nm$ technology. The scalable design of Muntaniala allows running large RNN models by combining multiple tiles in a systolic array. We keep all parameters stationary on every die in the array, drastically reducing the I/O communication to only loading new features and sharing partial results with other dies. For quantifying the overall system power, including I/O power, we built Vau da…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory