CrAM: A Compression-Aware Minimizer

Alexandra Peste; Adrian Vladu; Eldar Kurtic; Christoph H. Lampert; Dan; Alistarh

arXiv:2207.14200·cs.LG·May 5, 2023·1 cites

CrAM: A Compression-Aware Minimizer

Alexandra Peste, Adrian Vladu, Eldar Kurtic, Christoph H. Lampert, Dan, Alistarh

PDF

Open Access 1 Repo 1 Video

TL;DR

CrAM is a new optimization method that produces neural network models inherently stable under compression, enabling high sparsity pruning with minimal accuracy loss and supporting various compression patterns.

Contribution

CrAM introduces a compression-aware minimizer that improves model stability under pruning, outperforming standard optimizers in compressibility and transfer learning performance.

Findings

01

Models trained with CrAM can be pruned to 70-80% sparsity with minimal accuracy loss.

02

CrAM-trained models outperform standard optimizers in accuracy and stability under compression.

03

Supports semi-structured 2:4 pruning patterns compatible with GPU hardware.

Abstract

Deep neural networks (DNNs) often have to be compressed, via pruning and/or quantization, before they can be deployed in practical settings. In this work we propose a new compression-aware minimizer dubbed CrAM that modifies the optimization step in a principled way, in order to produce models whose local loss behavior is stable under compression operations such as pruning. Thus, dense models trained via CrAM should be compressible post-training, in a single step, without significant accuracy loss. Experimental results on standard benchmarks, such as residual networks for ImageNet classification and BERT models for language modelling, show that CrAM produces dense models that can be more accurate than the standard SGD/Adam-based baselines, but which are stable under weight pruning: specifically, we can prune models in one-shot to 70-80% sparsity with almost no accuracy loss, and to 90%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ist-daslab/cram
pytorchOfficial

Videos

CrAM: A Compression-Aware Minimizer· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Pruning · Layer Normalization · Residual Connection · Dropout · Weight Decay · Adam · Softmax · WordPiece