Deep Learning Approximation: Zero-Shot Neural Network Speedup
Michele Pratusevich

TL;DR
This paper introduces Deep Learning Approximation, a technique to rapidly create faster neural networks by structural modifications without retraining, achieving significant speedups with minimal accuracy loss.
Contribution
It presents a novel zero-shot method for neural network speedup using sequential optimizations, including lossless and lossy approximations, without requiring retraining or training data.
Findings
Achieved 2x speedup on YOLO network with 5% mAP drop
Method does not require retraining or access to original training data
Speedup can be regained through fine-tuning
Abstract
Neural networks offer high-accuracy solutions to a range of problems, but are costly to run in production systems because of computational and memory requirements during a forward pass. Given a trained network, we propose a techique called Deep Learning Approximation to build a faster network in a tiny fraction of the time required for training by only manipulating the network structure and coefficients without requiring re-training or access to the training data. Speedup is achieved by by applying a sequential series of independent optimizations that reduce the floating-point operations (FLOPs) required to perform a forward pass. First, lossless optimizations are applied, followed by lossy approximations using singular value decomposition (SVD) and low-rank matrix decomposition. The optimal approximation is chosen by weighing the relative accuracy loss and FLOP reduction according to a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI
