RPTQ: Reorder-based Post-training Quantization for Large Language Models
Zhihang Yuan, Lin Niu, Jiawei Liu, Wenyu Liu, Xinggang Wang, Yuzhang, Shang, Guangyu Sun, Qiang Wu, Jiaxiang Wu, Bingzhe Wu

TL;DR
This paper introduces RPTQ, a reorder-based post-training quantization method for large language models that effectively reduces memory usage by addressing channel range variations, enabling 3-bit activation quantization.
Contribution
RPTQ is the first to use 3-bit activation quantization in LLMs, improving memory efficiency by mitigating channel range differences through a novel reorder-based approach.
Findings
Achieved 3-bit activation quantization in LLMs.
Reduced memory consumption of OPT-175b by up to 80%.
Reorder-based approach effectively mitigates channel range issues.
Abstract
Large-scale language models (LLMs) have demonstrated impressive performance, but their deployment presents challenges due to their significant memory usage. This issue can be alleviated through quantization. In this paper, we identify that the challenge in quantizing activations in LLMs arises from varying ranges across channels, rather than solely the presence of outliers. To address this challenge, we introduce a quantization method called RPTQ, which utilizes a reorder-based approach. By rearranging the channels and quantizing them in clusters, RPTQ effectively mitigates the impact of range differences between channels. To minimize the overhead of the reorder operation, we fuse it into the layer norm operation and weights in linear layers. In our experiments, RPTQ achieved a significant breakthrough by utilizing 3-bit activation in LLMs for the first time, resulting in a substantial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
