EIE: Efficient Inference Engine on Compressed Deep Neural Network
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A., Horowitz, William J. Dally

TL;DR
EIE is an energy-efficient inference engine designed for compressed deep neural networks, achieving significant speed and power savings by exploiting sparsity and weight sharing, enabling fast inference on embedded systems.
Contribution
The paper introduces EIE, a novel hardware accelerator that performs inference directly on compressed DNNs, significantly reducing energy consumption and increasing speed compared to traditional CPU and GPU implementations.
Findings
EIE achieves 189x speedup over CPU and GPU without compression.
EIE reduces energy consumption by up to 24,000x compared to CPU and GPU.
EIE processes FC layers of AlexNet at 1.88x10^4 frames/sec with 600mW power.
Abstract
State-of-the-art deep neural networks (DNNs) have hundreds of millions of connections and are both computationally and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources and power budgets. While custom hardware helps the computation, fetching weights from DRAM is two orders of magnitude more expensive than ALU operations, and dominates the required power. Previously proposed 'Deep Compression' makes it possible to fit large DNNs (AlexNet and VGGNet) fully in on-chip SRAM. This compression is achieved by pruning the redundant connections and having multiple connections share the same weight. We propose an energy efficient inference engine (EIE) that performs inference on this compressed network model and accelerates the resulting sparse matrix-vector multiplication with weight sharing. Going from DRAM to SRAM gives EIE 120x energy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning and ELM · Brain Tumor Detection and Classification
Methods*Communicated@Fast*How Do I Communicate to Expedia?
