NeUQI: Near-Optimal Uniform Quantization Parameter Initialization for Low-Bit LLMs
Li Lin, Xinyu Hu, Xiaojun Wan

TL;DR
NeUQI introduces a novel method for initializing uniform quantization parameters in low-bit LLMs, significantly improving post-quantization performance and efficiency compared to traditional Min-Max approaches.
Contribution
The paper proposes NeUQI, a near-optimal initialization technique for uniform quantization that simplifies parameter optimization and enhances low-bit LLM performance.
Findings
NeUQI outperforms existing initialization methods on LLaMA and Qwen models.
Combining NeUQI with distillation surpasses PV-tuning in performance.
NeUQI reduces quantization error and improves model accuracy after quantization.
Abstract
Large language models (LLMs) achieve impressive performance across domains but face significant challenges when deployed on consumer-grade GPUs or personal devices such as laptops, due to high memory consumption and inference costs. Post-training quantization (PTQ) of LLMs offers a promising solution that reduces their memory footprint and decoding latency. In practice, PTQ with uniform quantization representation is favored due to its efficiency and ease of deployment, as uniform quantization is widely supported by mainstream hardware and software libraries. Recent studies on low-bit uniform quantization have led to noticeable improvements in post-quantization model performance; however, they mainly focus on quantization methodologies, while the initialization of quantization parameters remains underexplored and still relies on the conventional Min-Max formula. In this work, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Image and Signal Denoising Methods · CCD and CMOS Imaging Sensors
MethodsFocus · LLaMA
