Compression of Deep Convolutional Neural Networks for Fast and Low Power   Mobile Applications

Yong-Deok Kim; Eunhyeok Park; Sungjoo Yoo; Taelim Choi; Lu Yang,; Dongjun Shin

arXiv:1511.06530·cs.CV·February 25, 2016·ICLR·118 cites

Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications

Yong-Deok Kim, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang,, Dongjun Shin

PDF

Open Access 5 Repos

TL;DR

This paper introduces a simple, effective one-shot compression scheme for deep CNNs that significantly reduces model size, runtime, and energy consumption on mobile devices with minimal accuracy loss.

Contribution

The paper presents a novel one-shot whole network compression method combining variational Bayesian matrix factorization and Tucker decomposition, suitable for mobile deployment.

Findings

01

Significant reduction in model size, runtime, and energy use.

02

Effective compression of various CNN architectures like AlexNet, VGG, and GoogLeNet.

03

Minor accuracy loss after compression.

Abstract

Although the latest high-end smartphone has powerful CPU and GPU, running deeper convolutional neural networks (CNNs) for complex tasks such as ImageNet classification on mobile devices is challenging. To deploy deep CNNs on mobile devices, we present a simple and effective scheme to compress the entire CNN, which we call one-shot whole network compression. The proposed scheme consists of three steps: (1) rank selection with variational Bayesian matrix factorization, (2) Tucker decomposition on kernel tensor, and (3) fine-tuning to recover accumulated loss of accuracy, and each step can be easily implemented using publicly available tools. We demonstrate the effectiveness of the proposed scheme by testing the performance of various compressed CNNs (AlexNet, VGGS, GoogLeNet, and VGG-16) on the smartphone. Significant reductions in model size, runtime, and energy consumption are obtained,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTensor decomposition and applications · Advanced Data Compression Techniques · Advanced Neural Network Applications

Methods1x1 Convolution · Convolution · Average Pooling · Local Response Normalization · Auxiliary Classifier · Inception Module · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling