Vau da muntanialas: Energy-efficient multi-die scalable acceleration of RNN inference
Gianna Paulin, Francesco Conti, Lukas Cavigelli, Luca Benini

TL;DR
This paper introduces Muntaniala, an energy-efficient, scalable RNN accelerator architecture that leverages multi-die systolic arrays to improve performance and reduce power consumption in LSTM inference.
Contribution
The paper presents Muntaniala, a novel multi-die RNN accelerator architecture with demonstrated energy efficiency and scalability for large LSTM models, including a multi-chip prototype system.
Findings
Achieved 3.25 TOP/s/W energy efficiency in silicon
Demonstrated multi-chip array performing LSTM inference in 330μs
Reduced system power to 9.0mW at 10MHz
Abstract
Recurrent neural networks such as Long Short-Term Memories (LSTMs) learn temporal dependencies by keeping an internal state, making them ideal for time-series problems such as speech recognition. However, the output-to-input feedback creates distinctive memory bandwidth and scalability challenges in designing accelerators for RNNs. We present Muntaniala, an RNN accelerator architecture for LSTM inference with a silicon-measured energy-efficiency of 3.25 and performance of 30.53 in UMC 65 technology. The scalable design of Muntaniala allows running large RNN models by combining multiple tiles in a systolic array. We keep all parameters stationary on every die in the array, drastically reducing the I/O communication to only loading new features and sharing partial results with other dies. For quantifying the overall system power, including I/O power, we built Vau da…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory
