EIE: Efficient Inference Engine on Compressed Deep Neural Network

Song Han; Xingyu Liu; Huizi Mao; Jing Pu; Ardavan Pedram; Mark A.; Horowitz; William J. Dally

arXiv:1602.01528·cs.CV·May 4, 2016·143 cites

EIE: Efficient Inference Engine on Compressed Deep Neural Network

Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A., Horowitz, William J. Dally

PDF

Open Access 4 Repos

TL;DR

EIE is an energy-efficient inference engine designed for compressed deep neural networks, achieving significant speed and power savings by exploiting sparsity and weight sharing, enabling fast inference on embedded systems.

Contribution

The paper introduces EIE, a novel hardware accelerator that performs inference directly on compressed DNNs, significantly reducing energy consumption and increasing speed compared to traditional CPU and GPU implementations.

Findings

01

EIE achieves 189x speedup over CPU and GPU without compression.

02

EIE reduces energy consumption by up to 24,000x compared to CPU and GPU.

03

EIE processes FC layers of AlexNet at 1.88x10^4 frames/sec with 600mW power.

Abstract

State-of-the-art deep neural networks (DNNs) have hundreds of millions of connections and are both computationally and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources and power budgets. While custom hardware helps the computation, fetching weights from DRAM is two orders of magnitude more expensive than ALU operations, and dominates the required power. Previously proposed 'Deep Compression' makes it possible to fit large DNNs (AlexNet and VGGNet) fully in on-chip SRAM. This compression is achieved by pruning the redundant connections and having multiple connections share the same weight. We propose an energy efficient inference engine (EIE) that performs inference on this compressed network model and accelerates the resulting sparse matrix-vector multiplication with weight sharing. Going from DRAM to SRAM gives EIE 120x energy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning and ELM · Brain Tumor Detection and Classification

Methods*Communicated@Fast*How Do I Communicate to Expedia?