Optimal Brain Compression: A Framework for Accurate Post-Training   Quantization and Pruning

Elias Frantar; Sidak Pal Singh; Dan Alistarh

arXiv:2208.11580·cs.LG·January 10, 2023·36 cites

Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning

Elias Frantar, Sidak Pal Singh, Dan Alistarh

PDF

Open Access 1 Repo 2 Models 1 Video

TL;DR

This paper introduces a unified, efficient framework for post-training compression of deep neural networks that combines pruning and quantization, significantly improving accuracy and compression trade-offs without retraining.

Contribution

A novel compression framework extending the Optimal Brain Surgeon method to include quantization, enabling effective one-shot model compression for DNNs.

Findings

01

Significantly better compression-accuracy trade-offs than existing methods

02

Enables accurate combined pruning and quantization post-training

03

Efficient in time and space for practical deployment

Abstract

We consider the problem of model compression for deep neural networks (DNNs) in the challenging one-shot/post-training setting, in which we are given an accurate trained model, and must compress it without any retraining, based only on a small amount of calibration input data. This problem has become popular in view of the emerging software and hardware support for executing models compressed via pruning and/or quantization with speedup, and well-performing solutions have been proposed independently for both compression approaches. In this paper, we introduce a new compression framework which covers both weight pruning and quantization in a unified setting, is time- and space-efficient, and considerably improves upon the practical performance of existing post-training methods. At the technical level, our approach is based on an exact and efficient realization of the classical Optimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ist-daslab/obc
pytorchOfficial

Models

Videos

Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning· slideslive

Taxonomy

TopicsMedical Imaging Techniques and Applications · Advanced MRI Techniques and Applications · Medical Image Segmentation Techniques

MethodsPruning