Deep Compression for PyTorch Model Deployment on Microcontrollers

Eren Dogan; H. Fatih Ugurdag; Hasan Unlu

arXiv:2103.15972·cs.LG·March 31, 2021

Deep Compression for PyTorch Model Deployment on Microcontrollers

Eren Dogan, H. Fatih Ugurdag, Hasan Unlu

PDF

Open Access 1 Repo

TL;DR

This paper presents a method for compressing PyTorch neural network models using Deep Compression techniques, enabling efficient deployment on microcontrollers by reducing memory and computation requirements.

Contribution

It combines pruning, quantization, and sparse matrix storage to optimize PyTorch models specifically for microcontroller deployment, improving memory and speed.

Findings

01

Memory footprint reduced by 12.45x for LeNet-5

02

Inference speed increased by 2.57x

03

Effective deployment on resource-constrained MCUs

Abstract

Neural network deployment on low-cost embedded systems, hence on microcontrollers (MCUs), has recently been attracting more attention than ever. Since MCUs have limited memory capacity as well as limited compute-speed, it is critical that we employ model compression, which reduces both memory and compute-speed requirements. In this paper, we add model compression, specifically Deep Compression, and further optimize Unlu's earlier work on arXiv, which efficiently deploys PyTorch models on MCUs. First, we prune the weights in convolutional and fully connected layers. Secondly, the remaining weights and activations are quantized to 8-bit integers from 32-bit floating-point. Finally, forward pass functions are compressed using special data structures for sparse matrices, which store only nonzero weights (without impacting performance and accuracy). In the case of the LeNet-5 model, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

biarmic/pytorch-compression-for-mcu
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReal-time simulation and control systems · Embedded Systems Design Techniques · Parallel Computing and Optimization Techniques