ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA

Song Han; Junlong Kang; Huizi Mao; Yiming Hu; Xin Li; Yubin Li,; Dongliang Xie; Hong Luo; Song Yao; Yu Wang; Huazhong Yang; William J. Dally

arXiv:1612.00694·cs.CL·February 21, 2017·61 cites

ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA

Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li,, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, William J. Dally

PDF

Open Access

TL;DR

This paper introduces ESE, a FPGA-based speech recognition engine that compresses LSTM models via pruning and quantization, enabling faster, energy-efficient speech recognition with high performance on hardware.

Contribution

It proposes a load-balance-aware pruning method, a scheduler for parallel processing, and a specialized FPGA architecture for efficient speech recognition on compressed LSTM models.

Findings

01

Achieves 20x model compression with negligible accuracy loss.

02

ESE processes speech recognition tasks at 2.52 TOPS on FPGA.

03

Outperforms CPU and GPU in speed and energy efficiency by large margins.

Abstract

Long Short-Term Memory (LSTM) is widely used in speech recognition. In order to achieve higher prediction accuracy, machine learning scientists have built larger and larger models. Such large model is both computation intensive and memory intensive. Deploying such bulky model results in high power consumption and leads to high total cost of ownership (TCO) of a data center. In order to speedup the prediction and make it energy efficient, we first propose a load-balance-aware pruning method that can compress the LSTM model size by 20x (10x from pruning and 2x from quantization) with negligible loss of the prediction accuracy. The pruned model is friendly for parallel processing. Next, we propose scheduler that encodes and partitions the compressed model to each PE for parallelism, and schedule the complicated LSTM data flow. Finally, we design the hardware architecture, named Efficient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques

MethodsPruning · Sigmoid Activation · Tanh Activation · Long Short-Term Memory