GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices based on Fine-Grained Structured Weight Sparsity
Wei Niu, Zhengang Li, Xiaolong Ma, Peiyan Dong, Gang Zhou, Xuehai, Qian, Xue Lin, Yanzhi Wang, Bin Ren

TL;DR
GRIM is a novel real-time deep learning inference framework for mobile devices that uses fine-grained structured weight sparsity and compiler optimizations to accelerate CNNs and RNNs with significant speedups.
Contribution
The paper introduces a new BCR pruning scheme and a comprehensive framework combining compiler optimizations and weight pruning for efficient mobile DNN inference.
Findings
Achieves up to 14.08x speedup over existing frameworks.
Supports both CNNs and RNNs with high accuracy.
Enables real-time DNN inference on resource-constrained mobile devices.
Abstract
It is appealing but challenging to achieve real-time deep neural network (DNN) inference on mobile devices because even the powerful modern mobile devices are considered as ``resource-constrained'' when executing large-scale DNNs. It necessitates the sparse model inference via weight pruning, i.e., DNN weight sparsity, and it is desirable to design a new DNN weight sparsity scheme that can facilitate real-time inference on mobile devices while preserving a high sparse model accuracy. This paper designs a novel mobile inference acceleration framework GRIM that is General to both convolutional neural networks (CNNs) and recurrent neural networks (RNNs) and that achieves Real-time execution and high accuracy, leveraging fine-grained structured sparse model Inference and compiler optimizations for Mobiles. We start by proposing a new fine-grained structured sparsity scheme through the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · CCD and CMOS Imaging Sensors · Anomaly Detection Techniques and Applications
MethodsPruning
