PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with   Pattern-based Weight Pruning

Wei Niu; Xiaolong Ma; Sheng Lin; Shihao Wang; Xuehai Qian; Xue Lin,; Yanzhi Wang; Bin Ren

arXiv:2001.00138·cs.LG·January 23, 2020

PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning

Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin,, Yanzhi Wang, Bin Ren

PDF

TL;DR

PatDNN introduces a novel pattern-based weight pruning technique combined with compiler optimizations, enabling real-time, high-accuracy DNN inference on mobile devices, outperforming existing frameworks significantly.

Contribution

The paper presents a new pattern-based pruning method and an end-to-end framework that achieves high accuracy and efficiency for mobile DNN inference, bridging the gap between fine-grained accuracy and hardware efficiency.

Findings

01

Up to 44.5x speedup over TensorFlow Lite

02

Achieves real-time inference on large-scale DNNs

03

No accuracy loss compared to unpruned models

Abstract

With the emergence of a spectrum of high-end mobile devices, many applications that formerly required desktop-level computation capability are being transferred to these devices. However, executing the inference of Deep Neural Networks (DNNs) is still challenging considering high computation and storage demands, specifically, if real-time performance with high accuracy is needed. Weight pruning of DNNs is proposed, but existing schemes represent two extremes in the design space: non-structured pruning is fine-grained, accurate, but not hardware friendly; structured pruning is coarse-grained, hardware-efficient, but with higher accuracy loss. In this paper, we introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space. With the higher accuracy enabled by fine-grained pruning patterns, the unique…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsPruning · Alternating Direction Method of Multipliers