Fast convolutional neural networks on FPGAs with hls4ml

Thea Aarrestad; Vladimir Loncar; Nicol\`o Ghielmetti; Maurizio; Pierini; Sioni Summers; Jennifer Ngadiuba; Christoffer Petersson; Hampus; Linander; Yutaro Iiyama; Giuseppe Di Guglielmo; Javier Duarte; Philip Harris,; Dylan Rankin; Sergo Jindariani; Kevin Pedro; Nhan Tran; Mia Liu; Edward; Kreinar; Zhenbin Wu; and Duc Hoang

arXiv:2101.05108·cs.LG·July 19, 2021

Fast convolutional neural networks on FPGAs with hls4ml

Thea Aarrestad, Vladimir Loncar, Nicol\`o Ghielmetti, Maurizio, Pierini, Sioni Summers, Jennifer Ngadiuba, Christoffer Petersson, Hampus, Linander, Yutaro Iiyama, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris,, Dylan Rankin, Sergo Jindariani, Kevin Pedro, Nhan Tran

PDF

2 Repos

TL;DR

This paper presents an extension of the hls4ml library enabling deployment of ultra low-latency, low-power convolutional neural networks on FPGAs, with significant resource savings and minimal accuracy loss.

Contribution

We extend hls4ml to support convolutional neural networks on FPGAs, achieving microsecond latency and high resource efficiency through model compression techniques.

Findings

01

Achieved 5 microsecond inference latency on FPGA.

02

Reduced FPGA resource usage by up to 97% with no accuracy loss.

03

Demonstrated effective model compression methods like pruning and quantization.

Abstract

We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on FPGAs. By extending the hls4ml library, we demonstrate an inference latency of $5 μ$ s using convolutional architectures, targeting microsecond latency applications like those at the CERN Large Hadron Collider. Considering benchmark models trained on the Street View House Numbers Dataset, we demonstrate various methods for model compression in order to fit the computational constraints of a typical FPGA device used in trigger and data acquisition systems of particle detectors. In particular, we discuss pruning and quantization-aware training, and demonstrate how resource utilization can be significantly reduced with little to no loss in model accuracy. We show that the FPGA critical resource consumption can be reduced by 97% with zero loss in model accuracy, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsPruning