Two-Stage Grid Optimization for Group-wise Quantization of LLMs
Junhan Kim, Gukryeol Lee, Seungwoo Son, Jeewook Kim, and Yongkweon Jeon

TL;DR
This paper introduces a two-stage optimization method for group-wise quantization of large language models, explicitly minimizing reconstruction loss and improving accuracy without significant computational overhead.
Contribution
It proposes a novel two-stage framework that incorporates input statistics and layer-wise reconstruction loss minimization, enhancing existing GPTQ-based quantization methods.
Findings
Achieves higher accuracy in LLM quantization.
Maintains negligible computational overhead.
Effectively incorporates input statistics and inter-group correlations.
Abstract
Group-wise quantization is an effective strategy for mitigating accuracy degradation in low-bit quantization of large language models (LLMs). Among existing methods, GPTQ has been widely adopted due to its efficiency; however, it neglects input statistics and inter-group correlations when determining group scales, leading to a mismatch with its goal of minimizing layer-wise reconstruction loss. In this work, we propose a two-stage optimization framework for group scales that explicitly minimizes the layer-wise reconstruction loss. In the first stage, performed prior to GPTQ, we initialize each group scale to minimize the group-wise reconstruction loss, thereby incorporating input statistics. In the second stage, we freeze the integer weights obtained via GPTQ and refine the group scales to minimize the layer-wise reconstruction loss. To this end, we employ the coordinate descent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Speech Recognition and Synthesis · Advanced Neural Network Applications
