FAQ: Mitigating Quantization Error via Regenerating Calibration Data with Family-Aware Quantization
Haiyang Xiao, Weiqing Li, Jinyue Guo, Guochao Jiang, Guohua Liu, Yuewei Zhang

TL;DR
FAQ introduces a novel calibration data regeneration method using family-aware knowledge from large language models, significantly improving post-training quantization accuracy for resource-limited deployment.
Contribution
It proposes a framework that regenerates high-fidelity calibration data leveraging family knowledge, addressing the limitations of traditional PTQ calibration data.
Findings
Reduces accuracy loss by up to 28.5% on multiple models
Generates calibration data with Chain-of-Thought reasoning
Enhances PTQ effectiveness with regenerated data
Abstract
Although post-training quantization (PTQ) provides an efficient numerical compression scheme for deploying large language models (LLMs) on resource-constrained devices, the representativeness and universality of calibration data remain a core bottleneck in determining the accuracy of quantization parameters. Traditional PTQ methods typically rely on limited samples, making it difficult to capture the activation distribution during the inference phase, leading to biases in quantization parameters. To address this, we propose \textbf{FAQ} (Family-Aware Quantization), a calibration data regeneration framework that leverages prior knowledge from LLMs of the same family to generate high-fidelity calibration samples. Specifically, FAQ first inputs the original calibration samples into a larger LLM from the same family as the target model, regenerating a series of high-fidelity calibration…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
