On the Impact of Calibration Data in Post-training Quantization and   Pruning

Miles Williams; Nikolaos Aletras

arXiv:2311.09755·cs.CL·November 6, 2024·2 cites

On the Impact of Calibration Data in Post-training Quantization and Pruning

Miles Williams, Nikolaos Aletras

PDF

Open Access

TL;DR

This paper systematically investigates how calibration data influences the effectiveness of post-training quantization and pruning methods for large language models, revealing significant performance variations and providing practical recommendations.

Contribution

It is the first comprehensive empirical study on the impact of calibration data on LLM compression techniques, highlighting variability and offering guidelines for better calibration practices.

Findings

01

Calibration data significantly affects model performance.

02

Performance varies widely across different methods and datasets.

03

Recommendations improve the robustness of compression techniques.

Abstract

Quantization and pruning form the foundation of compression for neural networks, enabling efficient inference for large language models (LLMs). Recently, various quantization and pruning techniques have demonstrated remarkable performance in a post-training setting. They rely upon calibration data, a small set of unlabeled examples that are used to generate layer activations. However, no prior work has systematically investigated how the calibration data impacts the effectiveness of model compression methods. In this paper, we present the first extensive empirical study on the effect of calibration data upon LLM performance. We trial a variety of quantization and pruning methods, datasets, tasks, and models. Surprisingly, we find substantial variations in downstream task performance, contrasting existing work that suggests a greater level of robustness to the calibration data. Finally,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsSparse Evolutionary Training · Pruning