Prune-then-Quantize or Quantize-then-Prune? Understanding the Impact of Compression Order in Joint Model Compression

Minjun Kim; Jaehyeon Choi; Hyunwoo Yang; Jongjin Kim; Jinho Song; and U Kang

arXiv:2603.18426·cs.AI·March 20, 2026

Prune-then-Quantize or Quantize-then-Prune? Understanding the Impact of Compression Order in Joint Model Compression

Minjun Kim, Jaehyeon Choi, Hyunwoo Yang, Jongjin Kim, Jinho Song, and U Kang

PDF

Open Access 3 Reviews

TL;DR

This paper investigates how the sequence of applying pruning and quantization affects joint model compression, providing theoretical insights and empirical validation to optimize compression order for better model performance.

Contribution

It introduces the Progressive Intensity Hypothesis and offers theoretical guarantees, advancing understanding of compression order effects in joint model compression.

Findings

01

Compression order significantly impacts model performance.

02

Weaker perturbations should be applied before stronger ones.

03

The hypothesis holds across language and vision models, multi-stage, and mixed-precision setups.

Abstract

What happens when multiple compression methods are combined-does the order in which they are applied matter? Joint model compression has emerged as a powerful strategy to achieve higher efficiency by combining multiple methods such as pruning and quantization. A central but underexplored factor in joint model compression is the compression order, or the sequence of different methods within the compression pipeline. Most prior studies have either sidestepped the issue by assuming orthogonality between techniques, while a few have examined them only in highly constrained cases. Consequently, the broader role of compression order in shaping model performance remains poorly understood. In this paper, we address the overlooked problem of compression order and provide both theoretical and empirical analysis. We formulate the problem of optimizing the compression order and introduce the…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

1. A new but meaningful problem explored in this paper: the paper introduces a rigorous formalization of compression order optimization, a previously neglected but practically crucial dimension in joint model compression. The Progressive Intensity Hypothesis provides a simple, actionable rule with theoretical grounding and broad implications. 2. Clear theoretical analysis. Theoretical results are clearly derived. Theorem 1 establishes performance ordering under disjoint selectivity, relating c

Weaknesses

1. Theoretical assumptions are strong. The “well-designed” compression assumption is idealized and not always met in real pruning heuristics. The analysis could better discuss how violations (e.g., correlated layer errors, adaptive pruning schedules) affect Theorem 2. The independence assumption between layers (Assumption 1) is strong, worth empirically validating with correlation metrics. 2. While extensions to multi-stage and MPQ are demonstrated, the theoretical discussion remains pairwise.

Reviewer 02Rating 4Confidence 3

Strengths

The paper is well-organized and easy to understand. The paper addresses a practical yet underexplored problem in model compression: the order of joint compression.

Weaknesses

1. The paper introduces the "Compression Equivalent Ratio" (CER) to unify the "intensity" metric. However, calculating the CER for a pruning method requires running the pruning experiment independently to measure its performance (e.g., 65% accuracy), and then finding (or interpolating) the quantization ratio $\mathcal{Q}$ that yields the same performance. This implies that to apply the hypothesis, one must first run all compression methods individually to determine their "intensity" ranking. It

Reviewer 03Rating 8Confidence 2

Strengths

1. The topic is practical and relevant to real-world model deployment. 2. Experiments are extensive, covering multiple models and compression settings. 3. The paper provides a clear and actionable guideline that practitioners can easily adopt. 4. The writing is clear and the code is open-source.

Weaknesses

The paper mainly evaluates compression pipelines that consist of two or three stages (e.g., prune–quantize or prune–quantize–prune). Based on the proposed theory, could this framework be extended to longer or more complex multi-stage compression pipelines, and would the same hypothesis still lead to performance gains compared with existing setups/shorter pipelines?

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Data Compression Techniques · Explainable Artificial Intelligence (XAI)