TL;DR
FAIR-Pruner introduces a flexible, search-free framework for adaptive layer-wise structured pruning of neural networks, optimizing accuracy and compression by measuring overlap between removal and protection signals.
Contribution
It proposes a novel Tolerance of Difference (ToD) method for non-uniform pruning, combining multiple signals and providing theoretical analysis and extensive empirical validation.
Findings
Achieves strong accuracy-compression trade-offs on multiple datasets and architectures.
Demonstrates architectural extensibility with routed-expert models.
Provides open-source implementation for practical use.
Abstract
Structured pruning is a standard tool for compressing deep neural networks, but its practical performance depends on how sparsity is allocated across layers. We propose FAIR-Pruner, a search-free framework for adaptive layer-wise structured pruning. FAIR-Pruner uses two within-layer rankings: a removal-oriented signal that proposes candidate units and a protection-oriented signal that identifies task-sensitive units. Its core component, Tolerance of Difference (ToD), measures the overlap between the removal prefix and the protected tail, and uses a shared tolerance level to induce non-uniform pruning depths across layers. As a default vision instantiation, FAIR-Pruner combines a Wasserstein-based U-Score for class-conditional unit separability with a Taylor-based R-Score for task-level sensitivity; the same ToD allocation rule can also be paired with alternative removal signals.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
