DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices
Dawei Li, Xiaolong Wang, Deguang Kong

TL;DR
DeepRebirth introduces a novel framework that accelerates deep neural network execution on mobile devices by slimming non-tensor layers, achieving over 3x speed-up with minimal accuracy loss.
Contribution
The paper proposes a layer slimming method for non-tensor layers in neural networks, significantly improving runtime speed and memory efficiency on mobile devices.
Findings
Over 3x speed-up on GoogLeNet with minimal accuracy drop
Reduces runtime memory by 2.5x
Achieves 65ms inference on Samsung Galaxy S6 CPU
Abstract
Deploying deep neural networks on mobile devices is a challenging task. Current model compression methods such as matrix decomposition effectively reduce the deployed model size, but still cannot satisfy real-time processing requirement. This paper first discovers that the major obstacle is the excessive execution time of non-tensor layers such as pooling and normalization without tensor-like trainable parameters. This motivates us to design a novel acceleration framework: DeepRebirth through "slimming" existing consecutive and parallel non-tensor and tensor layers. The layer slimming is executed at different substructures: (a) streamline slimming by merging the consecutive non-tensor and tensor layer vertically; (b) branch slimming by merging non-tensor and tensor branches horizontally. The proposed optimization operations significantly accelerate the model execution and also greatly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Tensor decomposition and applications · Parallel Computing and Optimization Techniques
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Residual Connection · Convolution · Average Pooling · Fire Module · Global Average Pooling · 1x1 Convolution · Dropout · Xavier Initialization · Max Pooling
