Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration
Yifan Gong, Geng Yuan, Zheng Zhan, Wei Niu, Zhengang Li, Pu Zhao,, Yuxuan Cai, Sijia Liu, Bin Ren, Xue Lin, Xulong Tang, Yanzhi Wang

TL;DR
This paper introduces a flexible, fine-grained structured pruning method and compiler optimizations that automatically identify the best pruning schemes for each DNN layer, significantly improving inference speed on mobile devices without accuracy loss.
Contribution
It presents a novel, general pruning scheme applicable to all DNN layers and two automatic mapping methods to optimize pruning for hardware acceleration and accuracy.
Findings
Achieves up to 2.48× and 1.73× inference acceleration on CIFAR-10 and ImageNet.
Outperforms state-of-the-art DNN optimization frameworks.
Maintains accuracy while significantly improving inference speed.
Abstract
Weight pruning is an effective model compression technique to tackle the challenges of achieving real-time deep neural network (DNN) inference on mobile devices. However, prior pruning schemes have limited application scenarios due to accuracy degradation, difficulty in leveraging hardware acceleration, and/or restriction on certain types of DNN layers. In this paper, we propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations that are applicable to any type of DNN layer while achieving high accuracy and hardware inference performance. With the flexibility of applying different pruning schemes to different layers enabled by our compiler optimizations, we further probe into the new problem of determining the best-suited pruning scheme considering the different acceleration and accuracy performance of various pruning schemes. Two pruning scheme…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning
MethodsPruning
