Rapid Deployment of DNNs for Edge Computing via Structured Pruning at Initialization
Bailey J. Eccles, Leon Wong, Blesson Varghese

TL;DR
This paper introduces Reconvene, a system that uses structured pruning at initialization to quickly generate smaller, faster DNN models suitable for edge devices, maintaining accuracy while significantly reducing size and computation.
Contribution
The paper proposes a novel structured pruning at initialization method and system, Reconvene, enabling rapid, efficient edge deployment of DNNs with minimal accuracy loss.
Findings
Reconvene produces models up to 16.21x smaller.
Models are up to 2x faster in inference.
Pruned models maintain the same accuracy as unstructured methods.
Abstract
Edge machine learning (ML) enables localized processing of data on devices and is underpinned by deep neural networks (DNNs). However, DNNs cannot be easily run on devices due to their substantial computing, memory and energy requirements for delivering performance that is comparable to cloud-based ML. Therefore, model compression techniques, such as pruning, have been considered. Existing pruning methods are problematic for edge ML since they: (1) Create compressed models that have limited runtime performance benefits (using unstructured pruning) or compromise the final model accuracy (using structured pruning), and (2) Require substantial compute resources and time for identifying a suitable compressed DNN model (using neural architecture search). In this paper, we explore a new avenue, referred to as Pruning-at-Initialization (PaI), using structured pruning to mitigate the above…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIoT and Edge/Fog Computing · Advanced Memory and Neural Computing
MethodsPruning · Convolution
