DeepRebirth: Accelerating Deep Neural Network Execution on Mobile   Devices

Dawei Li; Xiaolong Wang; Deguang Kong

arXiv:1708.04728·cs.CV·January 12, 2018·30 cites

DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices

Dawei Li, Xiaolong Wang, Deguang Kong

PDF

Open Access

TL;DR

DeepRebirth introduces a novel framework that accelerates deep neural network execution on mobile devices by slimming non-tensor layers, achieving over 3x speed-up with minimal accuracy loss.

Contribution

The paper proposes a layer slimming method for non-tensor layers in neural networks, significantly improving runtime speed and memory efficiency on mobile devices.

Findings

01

Over 3x speed-up on GoogLeNet with minimal accuracy drop

02

Reduces runtime memory by 2.5x

03

Achieves 65ms inference on Samsung Galaxy S6 CPU

Abstract

Deploying deep neural networks on mobile devices is a challenging task. Current model compression methods such as matrix decomposition effectively reduce the deployed model size, but still cannot satisfy real-time processing requirement. This paper first discovers that the major obstacle is the excessive execution time of non-tensor layers such as pooling and normalization without tensor-like trainable parameters. This motivates us to design a novel acceleration framework: DeepRebirth through "slimming" existing consecutive and parallel non-tensor and tensor layers. The layer slimming is executed at different substructures: (a) streamline slimming by merging the consecutive non-tensor and tensor layer vertically; (b) branch slimming by merging non-tensor and tensor branches horizontally. The proposed optimization operations significantly accelerate the model execution and also greatly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Tensor decomposition and applications · Parallel Computing and Optimization Techniques

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Residual Connection · Convolution · Average Pooling · Fire Module · Global Average Pooling · 1x1 Convolution · Dropout · Xavier Initialization · Max Pooling