A Survey of Low-bit Large Language Models: Basics, Systems, and Algorithms

Ruihao Gong; Yifu Ding; Zining Wang; Chengtao Lv; Xingyu Zheng; Jinyang Du; Haotong Qin; Jinyang Guo; Michele Magno; Xianglong Liu

arXiv:2409.16694·cs.AI·November 13, 2025·3 cites

A Survey of Low-bit Large Language Models: Basics, Systems, and Algorithms

Ruihao Gong, Yifu Ding, Zining Wang, Chengtao Lv, Xingyu Zheng, Jinyang Du, Haotong Qin, Jinyang Guo, Michele Magno, Xianglong Liu

PDF

Open Access

TL;DR

This survey comprehensively reviews low-bit quantization techniques for large language models, addressing their principles, system implementations, and algorithms to improve efficiency and reduce resource requirements.

Contribution

It provides a systematic overview of low-bit quantization methods, including new data formats, system frameworks, and algorithmic strategies tailored for LLMs.

Findings

01

Categorization of low-bit quantization techniques

02

Analysis of system implementations across hardware platforms

03

Discussion of future trends and potential advancements

Abstract

Large language models (LLMs) have achieved remarkable advancements in natural language processing, showcasing exceptional performance across various tasks. However, the expensive memory and computational requirements present significant challenges for their practical deployment. Low-bit quantization has emerged as a critical approach to mitigate these challenges by reducing the bit-width of model parameters, activations, and gradients, thus decreasing memory usage and computational demands. This paper presents a comprehensive survey of low-bit quantization methods tailored for LLMs, covering the fundamental principles, system implementations, and algorithmic strategies. An overview of basic concepts and new data formats specific to low-bit LLMs is first introduced, followed by a review of frameworks and systems that facilitate low-bit LLMs across various hardware platforms. Then, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Neural Networks and Applications · Speech Recognition and Synthesis