RPTQ: Reorder-based Post-training Quantization for Large Language Models

Zhihang Yuan; Lin Niu; Jiawei Liu; Wenyu Liu; Xinggang Wang; Yuzhang; Shang; Guangyu Sun; Qiang Wu; Jiaxiang Wu; Bingzhe Wu

arXiv:2304.01089·cs.CL·May 18, 2023·20 cites

RPTQ: Reorder-based Post-training Quantization for Large Language Models

Zhihang Yuan, Lin Niu, Jiawei Liu, Wenyu Liu, Xinggang Wang, Yuzhang, Shang, Guangyu Sun, Qiang Wu, Jiaxiang Wu, Bingzhe Wu

PDF

Open Access 1 Repo

TL;DR

This paper introduces RPTQ, a reorder-based post-training quantization method for large language models that effectively reduces memory usage by addressing channel range variations, enabling 3-bit activation quantization.

Contribution

RPTQ is the first to use 3-bit activation quantization in LLMs, improving memory efficiency by mitigating channel range differences through a novel reorder-based approach.

Findings

01

Achieved 3-bit activation quantization in LLMs.

02

Reduced memory consumption of OPT-175b by up to 80%.

03

Reorder-based approach effectively mitigates channel range issues.

Abstract

Large-scale language models (LLMs) have demonstrated impressive performance, but their deployment presents challenges due to their significant memory usage. This issue can be alleviated through quantization. In this paper, we identify that the challenge in quantizing activations in LLMs arises from varying ranges across channels, rather than solely the presence of outliers. To address this challenge, we introduce a quantization method called RPTQ, which utilizes a reorder-based approach. By rearranging the channels and quantizing them in clusters, RPTQ effectively mitigates the impact of range differences between channels. To minimize the overhead of the reorder operation, we fuse it into the layer norm operation and weights in linear layers. In our experiments, RPTQ achieved a significant breakthrough by utilizing 3-bit activation in LLMs for the first time, resulting in a substantial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hahnyuan/rptq4llm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis