Energy Efficient LSTM Accelerators for Embedded FPGAs through Parameterised Architecture Design

Chao Qian; Tianheng Ling; and Gregor Schiele

arXiv:2604.19293·cs.AR·April 22, 2026

Energy Efficient LSTM Accelerators for Embedded FPGAs through Parameterised Architecture Design

Chao Qian, Tianheng Ling, and Gregor Schiele

PDF

TL;DR

This paper introduces a parameterised FPGA-based LSTM accelerator that enhances energy efficiency and speed for embedded devices, adaptable to various resource constraints.

Contribution

It presents a novel, adaptable hardware design for LSTM accelerators on embedded FPGAs, optimizing energy and performance.

Findings

01

Achieves 11.89 GOP/s/W energy efficiency during real-time inference.

02

Supports multiple optimisation parameters like DSP usage and activation functions.

03

Improves execution speed and reduces energy consumption compared to related work.

Abstract

Long Short-term Memory Networks (LSTMs) are a vital Deep Learning technique suitable for performing on-device time series analysis on local sensor data streams of embedded devices. In this paper, we propose a new hardware accelerator design for LSTMs specially optimised for resource-scarce embedded Field Programmable Gate Arrays (FPGAs). Our design improves the execution speed and reduces energy consumption compared to related work. Moreover, it can be adapted to different situations using a number of optimisation parameters, such as the usage of DSPs or the implementation of activation functions. We present our key design decisions and evaluate the performance. Our accelerator achieves an energy efficiency of 11.89 GOP/s/W during a real-time inference with 32873 samples/s.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.