Recurrent Neural Networks Hardware Implementation on FPGA
Andre Xian Ming Chang, Berin Martini, Eugenio Culurciello

TL;DR
This paper presents a hardware implementation of an LSTM RNN on FPGA, achieving over 21 times faster processing than embedded ARM CPU, with potential applications in mobile devices.
Contribution
It introduces a novel FPGA-based hardware implementation of LSTM RNNs that significantly accelerates computation compared to traditional CPUs.
Findings
Over 21x faster than ARM CPU on FPGA
Implemented 2-layer RNN with 128 hidden units
Tested with character-level language model
Abstract
Recurrent Neural Networks (RNNs) have the ability to retain memory and learn data sequences. Due to the recurrent nature of RNNs, it is sometimes hard to parallelize all its computations on conventional hardware. CPUs do not currently offer large parallelism, while GPUs offer limited parallelism due to sequential components of RNN models. In this paper we present a hardware implementation of Long-Short Term Memory (LSTM) recurrent network on the programmable logic Zynq 7020 FPGA from Xilinx. We implemented a RNN with layers and hidden units in hardware and it has been tested using a character level language model. The implementation is more than faster than the ARM CPU embedded on the Zynq 7020 FPGA. This work can potentially evolve to a RNN co-processor for future mobile devices.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Neural Networks and Applications · Human Pose and Action Recognition
