Partition Pruning: Parallelization-Aware Pruning for Deep Neural   Networks

Sina Shahhosseini; Ahmad Albaqsami; Masoomeh Jasemi; Nader Bagherzadeh

arXiv:1901.11391·cs.CV·February 28, 2019·6 cites

Partition Pruning: Parallelization-Aware Pruning for Deep Neural Networks

Sina Shahhosseini, Ahmad Albaqsami, Masoomeh Jasemi, Nader Bagherzadeh

PDF

Open Access

TL;DR

Partition Pruning is a novel method that reduces neural network parameters considering parallelization, significantly speeding up inference and decreasing energy consumption with minimal accuracy loss.

Contribution

It introduces a new partition pruning scheme that optimizes neural network pruning for parallel inference, improving speed and energy efficiency.

Findings

01

7.72x speedup in inference performance

02

2.73x reduction in energy consumption

03

Limited accuracy reduction in pruned models

Abstract

Parameters of recent neural networks require a huge amount of memory. These parameters are used by neural networks to perform machine learning tasks when processing inputs. To speed up inference, we develop Partition Pruning, an innovative scheme to reduce the parameters used while taking into consideration parallelization. We evaluated the performance and energy consumption of parallel inference of partitioned models, which showed a 7.72x speed up of performance and a 2.73x reduction in the energy used for computing pruned layers of TinyVGG16 in comparison to running the unpruned model on a single accelerator. In addition, our method showed a limited reduction some numbers in accuracy while partitioning fully connected layers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Human Pose and Action Recognition

MethodsPruning · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings