Deep Compression for PyTorch Model Deployment on Microcontrollers
Eren Dogan, H. Fatih Ugurdag, Hasan Unlu

TL;DR
This paper presents a method for compressing PyTorch neural network models using Deep Compression techniques, enabling efficient deployment on microcontrollers by reducing memory and computation requirements.
Contribution
It combines pruning, quantization, and sparse matrix storage to optimize PyTorch models specifically for microcontroller deployment, improving memory and speed.
Findings
Memory footprint reduced by 12.45x for LeNet-5
Inference speed increased by 2.57x
Effective deployment on resource-constrained MCUs
Abstract
Neural network deployment on low-cost embedded systems, hence on microcontrollers (MCUs), has recently been attracting more attention than ever. Since MCUs have limited memory capacity as well as limited compute-speed, it is critical that we employ model compression, which reduces both memory and compute-speed requirements. In this paper, we add model compression, specifically Deep Compression, and further optimize Unlu's earlier work on arXiv, which efficiently deploys PyTorch models on MCUs. First, we prune the weights in convolutional and fully connected layers. Secondly, the remaining weights and activations are quantized to 8-bit integers from 32-bit floating-point. Finally, forward pass functions are compressed using special data structures for sparse matrices, which store only nonzero weights (without impacting performance and accuracy). In the case of the LeNet-5 model, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReal-time simulation and control systems · Embedded Systems Design Techniques · Parallel Computing and Optimization Techniques
