SLiM: One-shot Quantization and Sparsity with Low-rank Approximation for LLM Weight Compression
Mohammad Mozaffari, Amir Yazdanbakhsh, Maryam Mehri Dehnavi

TL;DR
SLIM is a novel one-shot compression framework for large language models that combines quantization, sparsity, and low-rank approximation to significantly reduce memory usage while maintaining high accuracy without retraining.
Contribution
The paper introduces SLIM, a unified one-shot compression method integrating hardware-friendly quantization, sparsity, and low-rank approximation, with a novel saliency function for low-rank adapters.
Findings
Achieves up to 5.66% accuracy improvement on LLaMA-2-7B with 2:4 sparsity and 4-bit quantization.
Provides up to 4.3x and 3.8x speedup on Nvidia RTX3060 and A100 GPUs.
Reduces end-to-end memory by up to 0.23x compared to dense models.
Abstract
Conventional model compression techniques for LLMs address high memory consumption and slow inference challenges but typically require computationally expensive retraining to preserve accuracy. In contrast, one-shot compression methods eliminate retraining cost, but struggle to achieve accuracy comparable to dense models. This paper presents SLIM, a new one-shot compression framework that holistically integrates hardware-friendly quantization, sparsity, and low-rank approximation into a unified process. First, we formulate the quantization process using a probabilistic approach (SLIM-Quant) that enables us to apply uniform quantization. Then, we use an existing one-shot pruning method to apply semi-structured sparsity on top of the quantized weights. Finally, to compensate for the introduced aggregated quantization and sparsity error, we use a novel saliency function with unique…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Medical Image Segmentation Techniques · Advanced Image Processing Techniques
MethodsPruning
