CPrune: Compiler-Informed Model Pruning for Efficient Target-Aware DNN   Execution

Taeho Kim; Yongin Kwon; Jemin Lee; Taeho Kim; Sangtae Ha

arXiv:2207.01260·cs.LG·July 22, 2022

CPrune: Compiler-Informed Model Pruning for Efficient Target-Aware DNN Execution

Taeho Kim, Yongin Kwon, Jemin Lee, Taeho Kim, Sangtae Ha

PDF

Open Access 1 Repo

TL;DR

CPrune is a novel approach that integrates compiler insights into model pruning to optimize deep neural network execution on mobile devices, achieving significant speedups while maintaining accuracy.

Contribution

It introduces a compiler-informed pruning method that leverages structural information during compiler tuning for target-aware DNN optimization.

Findings

01

Achieves up to 2.73x speedup over TVM auto-tune

02

Maintains target accuracy with optimized pruning

03

Demonstrates effectiveness on mobile device scenarios

Abstract

Mobile devices run deep learning models for various purposes, such as image classification and speech recognition. Due to the resource constraints of mobile devices, researchers have focused on either making a lightweight deep neural network (DNN) model using model pruning or generating an efficient code using compiler optimization. Surprisingly, we found that the straightforward integration between model compression and compiler auto-tuning often does not produce the most efficient model for a target device. We propose CPrune, a compiler-informed model pruning for efficient target-aware DNN execution to support an application with a required target accuracy. CPrune makes a lightweight DNN model through informed pruning based on the structural information of subgraphs built during the compiler tuning process. Our experimental results show that CPrune increases the DNN execution speed up…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

taehokim20/cprune
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsContext-Aware Activity Recognition Systems · Parallel Computing and Optimization Techniques · Advanced Neural Network Applications

MethodsPruning · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings