ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA
Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li,, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, William J. Dally

TL;DR
This paper introduces ESE, a FPGA-based speech recognition engine that compresses LSTM models via pruning and quantization, enabling faster, energy-efficient speech recognition with high performance on hardware.
Contribution
It proposes a load-balance-aware pruning method, a scheduler for parallel processing, and a specialized FPGA architecture for efficient speech recognition on compressed LSTM models.
Findings
Achieves 20x model compression with negligible accuracy loss.
ESE processes speech recognition tasks at 2.52 TOPS on FPGA.
Outperforms CPU and GPU in speed and energy efficiency by large margins.
Abstract
Long Short-Term Memory (LSTM) is widely used in speech recognition. In order to achieve higher prediction accuracy, machine learning scientists have built larger and larger models. Such large model is both computation intensive and memory intensive. Deploying such bulky model results in high power consumption and leads to high total cost of ownership (TCO) of a data center. In order to speedup the prediction and make it energy efficient, we first propose a load-balance-aware pruning method that can compress the LSTM model size by 20x (10x from pruning and 2x from quantization) with negligible loss of the prediction accuracy. The pruned model is friendly for parallel processing. Next, we propose scheduler that encodes and partitions the compressed model to each PE for parallelism, and schedule the complicated LSTM data flow. Finally, we design the hardware architecture, named Efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques
MethodsPruning · Sigmoid Activation · Tanh Activation · Long Short-Term Memory
