CrAM: A Compression-Aware Minimizer
Alexandra Peste, Adrian Vladu, Eldar Kurtic, Christoph H. Lampert, Dan, Alistarh

TL;DR
CrAM is a new optimization method that produces neural network models inherently stable under compression, enabling high sparsity pruning with minimal accuracy loss and supporting various compression patterns.
Contribution
CrAM introduces a compression-aware minimizer that improves model stability under pruning, outperforming standard optimizers in compressibility and transfer learning performance.
Findings
Models trained with CrAM can be pruned to 70-80% sparsity with minimal accuracy loss.
CrAM-trained models outperform standard optimizers in accuracy and stability under compression.
Supports semi-structured 2:4 pruning patterns compatible with GPU hardware.
Abstract
Deep neural networks (DNNs) often have to be compressed, via pruning and/or quantization, before they can be deployed in practical settings. In this work we propose a new compression-aware minimizer dubbed CrAM that modifies the optimization step in a principled way, in order to produce models whose local loss behavior is stable under compression operations such as pruning. Thus, dense models trained via CrAM should be compressible post-training, in a single step, without significant accuracy loss. Experimental results on standard benchmarks, such as residual networks for ImageNet classification and BERT models for language modelling, show that CrAM produces dense models that can be more accurate than the standard SGD/Adam-based baselines, but which are stable under weight pruning: specifically, we can prune models in one-shot to 70-80% sparsity with almost no accuracy loss, and to 90%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Pruning · Layer Normalization · Residual Connection · Dropout · Weight Decay · Adam · Softmax · WordPiece
