HierarchicalPrune: Position-Aware Compression for Large-Scale Diffusion Models

Young D. Kwon; Rui Li; Sijia Li; Da Li; Sourav Bhattacharya; Stylianos I. Venieris

arXiv:2508.04663·cs.CV·March 3, 2026

HierarchicalPrune: Position-Aware Compression for Large-Scale Diffusion Models

Young D. Kwon, Rui Li, Sijia Li, Da Li, Sourav Bhattacharya, Stylianos I. Venieris

PDF

1 Video

TL;DR

HierarchicalPrune is a novel compression framework that leverages the functional hierarchy of diffusion model blocks to significantly reduce model size and inference latency while maintaining high output quality.

Contribution

This work introduces HierarchicalPrune, combining position-aware pruning, weight preservation, and sensitivity-guided distillation for effective diffusion model compression.

Findings

01

Achieves up to 80% memory reduction with minimal quality loss.

02

Reduces inference latency by up to 38%.

03

Maintains perceptual quality comparable to original models.

Abstract

State-of-the-art text-to-image diffusion models (DMs) achieve remarkable quality, yet their massive parameter scale (8-11B) poses significant challenges for inferences on resource-constrained devices. In this paper, we present HierarchicalPrune, a novel compression framework grounded in a key observation: DM blocks exhibit distinct functional hierarchies, where early blocks establish semantic structures while later blocks handle texture refinements. HierarchicalPrune synergistically combines three techniques: (1) Hierarchical Position Pruning, which identifies and removes less essential later blocks based on position hierarchy; (2) Positional Weight Preservation, which systematically protects early model portions that are essential for semantic structural integrity; and (3) Sensitivity-Guided Distillation, which adjusts knowledge-transfer intensity based on our discovery of block-wise…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

HierarchicalPrune: Position-Aware Compression for Large-Scale Diffusion Models· underline