Mixed-Precision Graph Neural Quantization for Low Bit Large Language Models
Wanlong Liu, Yichen Xiao, Dingyi Zeng, Hongyang Zhao, Wenyu Chen, Malu, Zhang

TL;DR
This paper introduces MG-PTQ, a mixed-precision graph neural network-based post-training quantization method that improves low-bit quantization of large language models, enabling resource-efficient deployment with better accuracy.
Contribution
The paper proposes a novel GNN-based mixed-precision PTQ approach that adaptively assigns bit-widths to weights, significantly enhancing low-bit quantization performance for LLMs.
Findings
Outperforms GPTQ at low-bit levels on WikiText2 and C4 datasets.
Achieves new state-of-the-art results in low-bit LLM quantization.
Effectively captures weight dependencies for optimized quantization.
Abstract
Post-Training Quantization (PTQ) is pivotal for deploying large language models (LLMs) within resource-limited settings by significantly reducing resource demands. However, existing PTQ strategies underperform at low bit levels < 3 bits due to the significant difference between the quantized and original weights. To enhance the quantization performance at low bit widths, we introduce a Mixed-precision Graph Neural PTQ (MG-PTQ) approach, employing a graph neural network (GNN) module to capture dependencies among weights and adaptively assign quantization bit-widths. Through the information propagation of the GNN module, our method more effectively captures dependencies among target weights, leading to a more accurate assessment of weight importance and optimized allocation of quantization strategies. Extensive experiments on the WikiText2 and C4 datasets demonstrate that our MG-PTQ…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Topic Modeling
MethodsGraph Neural Network
