FALCON: FLOP-Aware Combinatorial Optimization for Neural Network Pruning
Xiang Meng, Wenyu Chen, Riade Benbaki, Rahul Mazumder

TL;DR
FALCON introduces a FLOP-aware network pruning framework that optimizes accuracy, FLOPs, and sparsity simultaneously using combinatorial optimization, outperforming existing methods in accuracy at fixed FLOP budgets.
Contribution
The paper presents a novel ILP-based framework and algorithms for joint FLOP and sparsity constrained neural network pruning, addressing limitations of prior sparsity-focused methods.
Findings
FALCON achieves higher accuracy than state-of-the-art pruning methods at the same FLOP level.
The approach effectively handles large-scale models with millions of parameters.
FALCON outperforms existing methods in gradual pruning with re-training.
Abstract
The increasing computational demands of modern neural networks present deployment challenges on resource-constrained devices. Network pruning offers a solution to reduce model size and computational cost while maintaining performance. However, most current pruning methods focus primarily on improving sparsity by reducing the number of nonzero parameters, often neglecting other deployment costs such as inference time, which are closely related to the number of floating-point operations (FLOPs). In this paper, we propose FALCON, a novel combinatorial-optimization-based framework for network pruning that jointly takes into account model accuracy (fidelity), FLOPs, and sparsity constraints. A main building block of our approach is an integer linear program (ILP) that simultaneously handles FLOP and sparsity constraints. We present a novel algorithm to approximately solve the ILP. We propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsPruning · Focus
