SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models

Wei Huang; Haotong Qin; Yangdong Liu; Yawei Li; Qinshuo Liu; Xianglong Liu; Luca Benini; Michele Magno; Shiming Zhang; Xiaojuan Qi

arXiv:2405.14917·cs.LG·May 27, 2025·1 cites

SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models

Wei Huang, Haotong Qin, Yangdong Liu, Yawei Li, Qinshuo Liu, Xianglong Liu, Luca Benini, Michele Magno, Shiming Zhang, Xiaojuan Qi

PDF

Open Access 1 Repo 1 Models

TL;DR

SliM-LLM introduces a salience-driven mixed-precision quantization framework for large language models, improving accuracy and efficiency by adaptively allocating bit-widths based on weight importance, with demonstrated superior performance at low bit-widths.

Contribution

The paper presents a novel salience-driven mixed-precision quantization method that adaptively assigns bit-widths and calibrates quantizers based on weight importance, enhancing LLM compression without sacrificing speed.

Findings

01

2-bit LLaMA-7B reduces memory by 6x

02

Decreases perplexity by 48% over state-of-the-art PTQ

03

Maintains GPU inference speed

Abstract

Post-training quantization (PTQ) is an effective technique for compressing large language models (LLMs). However, while uniform-precision quantization is computationally efficient, it often compromises model performance. To address this, we propose SliM-LLM, a salience-driven mixed-precision quantization framework that allocates bit-widths at the group-wise. Our approach leverages the observation that important weights follow a structured distribution and introduces two key components: \textbf{1)} \textit{Salience-Determined Bit Allocation} adaptively assigns bit-widths to groups within each layer based on their salience; and \textbf{2)} \textit{Salience-Weighted Quantizer Calibration} optimizes quantizer parameters by incorporating element-level salience. With its structured partitioning, SliM-LLM provides a hardware-friendly solution that matches the efficiency of uniform quantization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Aaronhuang-778/SliM-LLM
pytorchOfficial

Models

🤗
AaronHuangWei/SliM-LLM_group-precision
model· ♡ 2
♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques