A Systematic Study of Compression Ordering for Large Language Models

Shivansh Chhawri; Rahul Mahadik; Suparna Rooj

arXiv:2511.19495·cs.LG·November 26, 2025

A Systematic Study of Compression Ordering for Large Language Models

Shivansh Chhawri, Rahul Mahadik, Suparna Rooj

PDF

Open Access

TL;DR

This paper systematically investigates how the order of applying compression techniques like pruning, distillation, and quantization affects the performance and compression ratio of large language models, providing practical guidelines for efficient deployment.

Contribution

It introduces a comprehensive analysis of compression technique sequences for LLMs, highlighting the optimal order for balancing compression and performance.

Findings

01

Quantization achieves the highest standalone compression.

02

Pruning causes moderate quality degradation.

03

Pruning, distillation, then quantization (P-KD-Q) yields the best balance.

Abstract

Large Language Models (LLMs) require substantial computational resources, making model compression essential for efficient deployment in constrained environments. Among the dominant compression techniques: knowledge distillation, structured pruning, and low-bit quantization, their individual effects are well studied, but their interactions and optimal sequencing remain unclear. This work systematically examines how these techniques perform both independently and in combination when applied to the Qwen2.5 3B model. We evaluate multiple compression pipelines, including single, and proposed three-technique sequences, using perplexity, G-Eval, clarity, prompt alignment, and compression ratio as metrics. Our experiments show that quantization provides the greatest standalone compression, while pruning introduces moderate quality degradation. Critically, the ordering of techniques…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education