Mixed-Precision Graph Neural Quantization for Low Bit Large Language   Models

Wanlong Liu; Yichen Xiao; Dingyi Zeng; Hongyang Zhao; Wenyu Chen; Malu; Zhang

arXiv:2501.18154·cs.CL·January 31, 2025

Mixed-Precision Graph Neural Quantization for Low Bit Large Language Models

Wanlong Liu, Yichen Xiao, Dingyi Zeng, Hongyang Zhao, Wenyu Chen, Malu, Zhang

PDF

Open Access

TL;DR

This paper introduces MG-PTQ, a mixed-precision graph neural network-based post-training quantization method that improves low-bit quantization of large language models, enabling resource-efficient deployment with better accuracy.

Contribution

The paper proposes a novel GNN-based mixed-precision PTQ approach that adaptively assigns bit-widths to weights, significantly enhancing low-bit quantization performance for LLMs.

Findings

01

Outperforms GPTQ at low-bit levels on WikiText2 and C4 datasets.

02

Achieves new state-of-the-art results in low-bit LLM quantization.

03

Effectively captures weight dependencies for optimized quantization.

Abstract

Post-Training Quantization (PTQ) is pivotal for deploying large language models (LLMs) within resource-limited settings by significantly reducing resource demands. However, existing PTQ strategies underperform at low bit levels < 3 bits due to the significant difference between the quantized and original weights. To enhance the quantization performance at low bit widths, we introduce a Mixed-precision Graph Neural PTQ (MG-PTQ) approach, employing a graph neural network (GNN) module to capture dependencies among weights and adaptively assign quantization bit-widths. Through the information propagation of the GNN module, our method more effectively captures dependencies among target weights, leading to a more accurate assessment of weight importance and optimized allocation of quantization strategies. Extensive experiments on the WikiText2 and C4 datasets demonstrate that our MG-PTQ…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Topic Modeling

MethodsGraph Neural Network