A Programmable Approach to Neural Network Compression

Vinu Joseph; Saurav Muralidharan; Animesh Garg; Michael Garland,; Ganesh Gopalakrishnan

arXiv:1911.02497·cs.LG·December 3, 2020

A Programmable Approach to Neural Network Compression

Vinu Joseph, Saurav Muralidharan, Animesh Garg, Michael Garland,, Ganesh Gopalakrishnan

PDF

1 Repo

TL;DR

This paper presents Condensa, a programmable system that automates neural network compression strategy design using Bayesian optimization, significantly reducing memory and runtime without manual trial-and-error.

Contribution

Introduces Condensa, a Python-based programmable framework that automatically infers optimal compression strategies for DNNs using Bayesian optimization.

Findings

01

Achieved 188x memory footprint reduction

02

Realized 2.59x hardware runtime throughput improvement

03

Operates effectively with at most ten samples per search

Abstract

Deep neural networks (DNNs) frequently contain far more weights, represented at a higher precision, than are required for the specific task which they are trained to perform. Consequently, they can often be compressed using techniques such as weight pruning and quantization that reduce both the model size and inference time without appreciable loss in accuracy. However, finding the best compression strategy and corresponding target sparsity for a given DNN, hardware platform, and optimization objective currently requires expensive, frequently manual, trial-and-error experimentation. In this paper, we introduce a programmable system for model compression called Condensa. Users programmatically compose simple operators, in Python, to build more complex and practically interesting compression strategies. Given a strategy and user-provided objective (such as minimization of running time),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

NVlabs/condensa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsPruning