Self-calibration for Language Model Quantization and Pruning

Miles Williams; George Chrysostomou; Nikolaos Aletras

arXiv:2410.17170·cs.CL·July 15, 2025

Self-calibration for Language Model Quantization and Pruning

Miles Williams, George Chrysostomou, Nikolaos Aletras

PDF

Open Access

TL;DR

This paper introduces self-calibration, a data-free method for quantizing and pruning language models by generating synthetic data from the model itself, improving performance without needing external calibration data.

Contribution

The paper presents a novel self-calibration technique that eliminates the need for external data in model compression, leveraging the model to generate synthetic calibration data.

Findings

01

Self-calibration often outperforms real data-based calibration methods.

02

The approach is effective across various models and compression techniques.

03

Self-calibration maintains high downstream task performance.

Abstract

Quantization and pruning are fundamental approaches for model compression, enabling efficient inference for language models. In a post-training setting, state-of-the-art quantization and pruning methods require calibration data, a small set of unlabeled examples. Conventionally, this is randomly sampled web text, aiming to reflect the model training data. However, this poses two key problems: (1) unrepresentative calibration examples can harm model performance, and (2) organizations increasingly avoid releasing model training data. In this paper, we propose self-calibration as a solution. Our approach requires no external data, instead leveraging the model itself to generate synthetic calibration data, with a view to better approximating the pre-training data distribution. We extensively compare the performance of self-calibration with several baselines, across a variety of models,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsSparse Evolutionary Training · Pruning