Energy Efficient LSTM Accelerators for Embedded FPGAs through Parameterised Architecture Design
Chao Qian, Tianheng Ling, and Gregor Schiele

TL;DR
This paper introduces a parameterised FPGA-based LSTM accelerator that enhances energy efficiency and speed for embedded devices, adaptable to various resource constraints.
Contribution
It presents a novel, adaptable hardware design for LSTM accelerators on embedded FPGAs, optimizing energy and performance.
Findings
Achieves 11.89 GOP/s/W energy efficiency during real-time inference.
Supports multiple optimisation parameters like DSP usage and activation functions.
Improves execution speed and reduces energy consumption compared to related work.
Abstract
Long Short-term Memory Networks (LSTMs) are a vital Deep Learning technique suitable for performing on-device time series analysis on local sensor data streams of embedded devices. In this paper, we propose a new hardware accelerator design for LSTMs specially optimised for resource-scarce embedded Field Programmable Gate Arrays (FPGAs). Our design improves the execution speed and reduces energy consumption compared to related work. Moreover, it can be adapted to different situations using a number of optimisation parameters, such as the usage of DSPs or the implementation of activation functions. We present our key design decisions and evaluate the performance. Our accelerator achieves an energy efficiency of 11.89 GOP/s/W during a real-time inference with 32873 samples/s.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
