UniComp: A Unified Evaluation of Large Language Model Compression via Pruning, Quantization and Distillation

Jonathan von Rad; Yong Cao; Andreas Geiger

arXiv:2602.09130·cs.LG·May 7, 2026

UniComp: A Unified Evaluation of Large Language Model Compression via Pruning, Quantization and Distillation

Jonathan von Rad, Yong Cao, Andreas Geiger

PDF

TL;DR

UniComp provides a comprehensive framework for evaluating various LLM compression methods across multiple dimensions, revealing insights into their effects on knowledge retention, reliability, and task-specific performance.

Contribution

It introduces a unified evaluation framework for pruning, quantization, and distillation, including diverse benchmarks and hardware-aware efficiency analysis.

Findings

01

Factual recall is largely preserved after compression.

02

Multi-step reasoning and multilingual capabilities degrade.

03

Task-specific calibration improves reasoning performance by up to 50%.

Abstract

Model compression is increasingly essential for deploying large language models (LLMs), yet existing comparative studies largely focus on pruning and quantization evaluated primarily on knowledge-centric benchmarks. Thus, we introduce UniComp, a unified evaluation framework for comparing pruning, quantization, and knowledge distillation. UniComp evaluates compressed models along three dimensions: performance, reliability, and efficiency, using a diverse set of capability- and safety-oriented benchmarks together with a hardware-aware efficiency analysis. Through evaluation of six compression techniques across 40 datasets, we observe (i) a consistent knowledge bias, where factual recall is largely preserved while multi-step reasoning, multilingual, and instruction-following capabilities degrade; (ii) a decoupling between performance and reliability, indicating that retained performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.