Two-Stage Grid Optimization for Group-wise Quantization of LLMs

Junhan Kim; Gukryeol Lee; Seungwoo Son; Jeewook Kim; and Yongkweon Jeon

arXiv:2602.02126·cs.LG·February 3, 2026

Two-Stage Grid Optimization for Group-wise Quantization of LLMs

Junhan Kim, Gukryeol Lee, Seungwoo Son, Jeewook Kim, and Yongkweon Jeon

PDF

Open Access

TL;DR

This paper introduces a two-stage optimization method for group-wise quantization of large language models, explicitly minimizing reconstruction loss and improving accuracy without significant computational overhead.

Contribution

It proposes a novel two-stage framework that incorporates input statistics and layer-wise reconstruction loss minimization, enhancing existing GPTQ-based quantization methods.

Findings

01

Achieves higher accuracy in LLM quantization.

02

Maintains negligible computational overhead.

03

Effectively incorporates input statistics and inter-group correlations.

Abstract

Group-wise quantization is an effective strategy for mitigating accuracy degradation in low-bit quantization of large language models (LLMs). Among existing methods, GPTQ has been widely adopted due to its efficiency; however, it neglects input statistics and inter-group correlations when determining group scales, leading to a mismatch with its goal of minimizing layer-wise reconstruction loss. In this work, we propose a two-stage optimization framework for group scales that explicitly minimizes the layer-wise reconstruction loss. In the first stage, performed prior to GPTQ, we initialize each group scale to minimize the group-wise reconstruction loss, thereby incorporating input statistics. In the second stage, we freeze the integer weights obtained via GPTQ and refine the group scales to minimize the layer-wise reconstruction loss. To this end, we employ the coordinate descent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Speech Recognition and Synthesis · Advanced Neural Network Applications