H2PIPE: High throughput CNN Inference on FPGAs with High-Bandwidth   Memory

Mario Doumet; Marius Stan; Mathew Hall; Vaughn Betz

arXiv:2408.09209·cs.AR·August 20, 2024

H2PIPE: High throughput CNN Inference on FPGAs with High-Bandwidth Memory

Mario Doumet, Marius Stan, Mathew Hall, Vaughn Betz

PDF

Open Access

TL;DR

This paper presents H2PIPE, a high-throughput FPGA-based CNN inference accelerator that effectively combines high-bandwidth memory and on-chip storage to accelerate large CNNs with significant speed-ups over prior work.

Contribution

It introduces a novel hardware and algorithmic approach to optimize memory usage and bandwidth for large CNNs on FPGAs, integrating with a CNN compiler for high performance.

Findings

01

Achieves at least 19.4x speed-up on ResNet-18

02

Achieves at least 5.1x speed-up on ResNet-50

03

Achieves at least 10.5x speed-up on VGG-16

Abstract

Convolutional Neural Networks (CNNs) combine large amounts of parallelizable computation with frequent memory access. Field Programmable Gate Arrays (FPGAs) can achieve low latency and high throughput CNN inference by implementing dataflow accelerators that pipeline layer-specific hardware to implement an entire network. By implementing a different processing element for each CNN layer, these layer-pipelined accelerators can achieve high compute density, but having all layers processing in parallel requires high memory bandwidth. Traditionally this has been satisfied by storing all weights on chip, but this is infeasible for the largest CNNs, which are often those most in need of acceleration. In this work we augment a state-of-the-art dataflow accelerator (HPIPE) to leverage both High-Bandwidth Memory (HBM) and on-chip storage, enabling high performance layer-pipelined dataflow…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Brain Tumor Detection and Classification

MethodsVGG-16