Rapid Deployment of DNNs for Edge Computing via Structured Pruning at   Initialization

Bailey J. Eccles; Leon Wong; Blesson Varghese

arXiv:2404.16877·cs.LG·April 9, 2025

Rapid Deployment of DNNs for Edge Computing via Structured Pruning at Initialization

Bailey J. Eccles, Leon Wong, Blesson Varghese

PDF

Open Access

TL;DR

This paper introduces Reconvene, a system that uses structured pruning at initialization to quickly generate smaller, faster DNN models suitable for edge devices, maintaining accuracy while significantly reducing size and computation.

Contribution

The paper proposes a novel structured pruning at initialization method and system, Reconvene, enabling rapid, efficient edge deployment of DNNs with minimal accuracy loss.

Findings

01

Reconvene produces models up to 16.21x smaller.

02

Models are up to 2x faster in inference.

03

Pruned models maintain the same accuracy as unstructured methods.

Abstract

Edge machine learning (ML) enables localized processing of data on devices and is underpinned by deep neural networks (DNNs). However, DNNs cannot be easily run on devices due to their substantial computing, memory and energy requirements for delivering performance that is comparable to cloud-based ML. Therefore, model compression techniques, such as pruning, have been considered. Existing pruning methods are problematic for edge ML since they: (1) Create compressed models that have limited runtime performance benefits (using unstructured pruning) or compromise the final model accuracy (using structured pruning), and (2) Require substantial compute resources and time for identifying a suitable compressed DNN model (using neural architecture search). In this paper, we explore a new avenue, referred to as Pruning-at-Initialization (PaI), using structured pruning to mitigate the above…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIoT and Edge/Fog Computing · Advanced Memory and Neural Computing

MethodsPruning · Convolution