TL;DR
This paper introduces ZipCal, a fast, model-agnostic data curation method that improves calibration data selection for pruning and quantization of large language models by maximizing lexical diversity.
Contribution
ZipCal is a novel, efficient data curation strategy based on Zipfian laws that outperforms standard sampling and rivals perplexity-based methods in model compression tasks.
Findings
ZipCal consistently outperforms uniform random sampling in pruning benchmarks.
It achieves comparable performance to perplexity-based methods at a fraction of the computational cost.
ZipCal is approximately 240 times faster than perplexity-based approaches.
Abstract
Post-training model compression is essential for enhancing the portability of Large Language Models (LLMs) while preserving their performance. While several compression approaches have been proposed, less emphasis has been placed on selecting the most suitable set of data (the so-called \emph{calibration data}) for finding the compressed model configuration. The choice of calibration data is a critical step in preserving model capabilities both intra- and inter-tasks. In this work, we address the challenge of identifying high-performance calibration sets for both pruning and quantization by analyzing intrinsic data properties rather than model-specific signals. We introduce \texttt{\textbf{ZipCal}}, a model-agnostic data curation strategy that maximizes lexical diversity based on Zipfian power laws. Experiments demonstrate that our method consistently outperforms standard uniform random…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
