Multi-Dimensional Pruning: Joint Channel, Layer and Block Pruning with Latency Constraint
Xinglong Sun, Barath Lakshmanan, Maying Shen, Shiyi Lan, Jingde Chen,, and Jose Alvarez

TL;DR
This paper introduces a multi-dimensional pruning framework that jointly optimizes channels, layers, and blocks under latency constraints, significantly improving model efficiency and accuracy for vision tasks.
Contribution
It presents a novel joint pruning method with a latency-aware model, formulated as a MINLP, enabling aggressive pruning with optimal latency-accuracy trade-offs.
Findings
Outperforms previous pruning methods at high ratios.
Achieves higher FPS and accuracy in classification.
Sets new state-of-the-art in 3D object detection.
Abstract
As we push the boundaries of performance in various vision tasks, the models grow in size correspondingly. To keep up with this growth, we need very aggressive pruning techniques for efficient inference and deployment on edge devices. Existing pruning approaches are limited to channel pruning and struggle with aggressive parameter reductions. In this paper, we propose a novel multi-dimensional pruning framework that jointly optimizes pruning across channels, layers, and blocks while adhering to latency constraints. We develop a latency modeling technique that accurately captures model-wide latency variations during pruning, which is crucial for achieving an optimal latency-accuracy trade-offs at high pruning ratio. We reformulate pruning as a Mixed-Integer Nonlinear Program (MINLP) to efficiently determine the optimal pruned structure with only a single pass. Our extensive results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDNA and Biological Computing
MethodsPruning
