PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning
Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin,, Yanzhi Wang, Bin Ren

TL;DR
PatDNN introduces a novel pattern-based weight pruning technique combined with compiler optimizations, enabling real-time, high-accuracy DNN inference on mobile devices, outperforming existing frameworks significantly.
Contribution
The paper presents a new pattern-based pruning method and an end-to-end framework that achieves high accuracy and efficiency for mobile DNN inference, bridging the gap between fine-grained accuracy and hardware efficiency.
Findings
Up to 44.5x speedup over TensorFlow Lite
Achieves real-time inference on large-scale DNNs
No accuracy loss compared to unpruned models
Abstract
With the emergence of a spectrum of high-end mobile devices, many applications that formerly required desktop-level computation capability are being transferred to these devices. However, executing the inference of Deep Neural Networks (DNNs) is still challenging considering high computation and storage demands, specifically, if real-time performance with high accuracy is needed. Weight pruning of DNNs is proposed, but existing schemes represent two extremes in the design space: non-structured pruning is fine-grained, accurate, but not hardware friendly; structured pruning is coarse-grained, hardware-efficient, but with higher accuracy loss. In this paper, we introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space. With the higher accuracy enabled by fine-grained pruning patterns, the unique…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPruning · Alternating Direction Method of Multipliers
