FPGA Resource-aware Structured Pruning for Real-Time Neural Networks
Benjamin Ramhorst, Vladimir Loncar, George A. Constantinides

TL;DR
This paper introduces a hardware-aware structured pruning method for neural networks that optimizes resource utilization on FPGAs, significantly reducing DSP and BRAM usage while maintaining performance.
Contribution
It presents a novel hardware-centric pruning approach formulated as a knapsack problem, improving resource efficiency over traditional unstructured pruning methods.
Findings
Achieves up to 92% reduction in DSP utilization.
Reduces BRAM usage by up to 81%.
Effective on diverse tasks including particle classification and image recognition.
Abstract
Neural networks achieve state-of-the-art performance in image classification, speech recognition, scientific analysis and many more application areas. Due to the high computational complexity and memory footprint of neural networks, various compression techniques, such as pruning and quantization, have been proposed in literature. Pruning sparsifies a neural network, reducing the number of multiplications and memory. However, pruning often fails to capture properties of the underlying hardware, causing unstructured sparsity and load-balance inefficiency, thus bottlenecking resource improvements. We propose a hardware-centric formulation of pruning, by formulating it as a knapsack problem with resource-aware tensor structures. Evaluated on a range of tasks, including sub-microsecond particle classification at CERN's Large Hadron Collider and fast image classification, the proposed method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Computational Physics and Python Applications · Parallel Computing and Optimization Techniques
MethodsPruning
