HAPM -- Hardware Aware Pruning Method for CNN hardware accelerators in   resource constrained devices

Federico Nicolas Peccia; Luciano Ferreyro; Alejandro Furfaro

arXiv:2408.14055·cs.AR·August 27, 2024

HAPM -- Hardware Aware Pruning Method for CNN hardware accelerators in resource constrained devices

Federico Nicolas Peccia, Luciano Ferreyro, Alejandro Furfaro

PDF

Open Access

TL;DR

This paper introduces a hardware-aware pruning method for CNNs tailored for FPGA accelerators, significantly improving inference speed in resource-constrained devices by exploiting sparsity and hardware scheduling.

Contribution

The work presents a generic FPGA-compatible hardware architecture and a custom pruning algorithm optimized for this hardware, outperforming standard pruning in inference speed.

Findings

01

Hardware-aware pruning achieves 45% faster inference.

02

Proposed architecture supports multiple CNN configurations.

03

Custom pruning exploits hardware scheduling for efficiency.

Abstract

During the last years, algorithms known as Convolutional Neural Networks (CNNs) had become increasingly popular, expanding its application range to several areas. In particular, the image processing field has experienced a remarkable advance thanks to this algorithms. In IoT, a wide research field aims to develop hardware capable of execute them at the lowest possible energy cost, but keeping acceptable image inference time. One can get around this apparently conflicting objectives by applying design and training techniques. The present work proposes a generic hardware architecture ready to be implemented on FPGA devices, supporting a wide range of configurations which allows the system to run different neural network architectures, dynamically exploiting the sparsity caused by pruning techniques in the mathematical operations present in this kind of algorithms. The inference speed of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCCD and CMOS Imaging Sensors · Advanced Neural Network Applications · Advanced Memory and Neural Computing

MethodsPruning · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings