HAPM -- Hardware Aware Pruning Method for CNN hardware accelerators in resource constrained devices
Federico Nicolas Peccia, Luciano Ferreyro, Alejandro Furfaro

TL;DR
This paper introduces a hardware-aware pruning method for CNNs tailored for FPGA accelerators, significantly improving inference speed in resource-constrained devices by exploiting sparsity and hardware scheduling.
Contribution
The work presents a generic FPGA-compatible hardware architecture and a custom pruning algorithm optimized for this hardware, outperforming standard pruning in inference speed.
Findings
Hardware-aware pruning achieves 45% faster inference.
Proposed architecture supports multiple CNN configurations.
Custom pruning exploits hardware scheduling for efficiency.
Abstract
During the last years, algorithms known as Convolutional Neural Networks (CNNs) had become increasingly popular, expanding its application range to several areas. In particular, the image processing field has experienced a remarkable advance thanks to this algorithms. In IoT, a wide research field aims to develop hardware capable of execute them at the lowest possible energy cost, but keeping acceptable image inference time. One can get around this apparently conflicting objectives by applying design and training techniques. The present work proposes a generic hardware architecture ready to be implemented on FPGA devices, supporting a wide range of configurations which allows the system to run different neural network architectures, dynamically exploiting the sparsity caused by pruning techniques in the mathematical operations present in this kind of algorithms. The inference speed of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Advanced Neural Network Applications · Advanced Memory and Neural Computing
MethodsPruning · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
