Optimize Weight Rounding via Signed Gradient Descent for the   Quantization of LLMs

Wenhua Cheng; Weiwei Zhang; Haihao Shen; Yiyang Cai; Xin He; Kaokao; Lv; Yi Liu

arXiv:2309.05516·cs.CL·October 10, 2024·1 cites

Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Wenhua Cheng, Weiwei Zhang, Haihao Shen, Yiyang Cai, Xin He, Kaokao, Lv, Yi Liu

PDF

Open Access 4 Repos 10 Models 1 Video

TL;DR

This paper introduces SignRound, a novel weight quantization method for large language models that uses signed gradient descent to optimize rounding and clipping, achieving high accuracy with minimal tuning.

Contribution

SignRound combines QAT and PTQ advantages, optimizing weight rounding via signed gradient descent in just 200 steps, reducing tuning costs and inference overhead.

Findings

01

Achieved 6.91% to 33.22% accuracy improvements at 2 bits.

02

Demonstrated near-lossless 4-bit quantization in most scenarios.

03

Effective across models and tasks, with minimal tuning effort.

Abstract

Large Language Models (LLMs) have demonstrated exceptional proficiency in language-related tasks, but their deployment poses significant challenges due to substantial memory and storage requirements. Weight-only quantization has emerged as a promising solution, significantly reducing memory and storage needs without sacrificing too much performance. In this study, we introduce SignRound, a method that leverages signed gradient descent (SignSGD) to optimize rounding values and weight clipping in just 200 steps. SignRound integrates the advantages of Quantization-Aware Training (QAT) and Post-Training Quantization (PTQ), delivering exceptional results across 2 to 4 bits while minimizing tuning costs and avoiding additional inference overhead. For example, SignRound achieved absolute average accuracy improvements ranging from 6.91% to 33.22% at 2bits, as measured by the average zero-shot…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis