H2PIPE: High throughput CNN Inference on FPGAs with High-Bandwidth Memory
Mario Doumet, Marius Stan, Mathew Hall, Vaughn Betz

TL;DR
This paper presents H2PIPE, a high-throughput FPGA-based CNN inference accelerator that effectively combines high-bandwidth memory and on-chip storage to accelerate large CNNs with significant speed-ups over prior work.
Contribution
It introduces a novel hardware and algorithmic approach to optimize memory usage and bandwidth for large CNNs on FPGAs, integrating with a CNN compiler for high performance.
Findings
Achieves at least 19.4x speed-up on ResNet-18
Achieves at least 5.1x speed-up on ResNet-50
Achieves at least 10.5x speed-up on VGG-16
Abstract
Convolutional Neural Networks (CNNs) combine large amounts of parallelizable computation with frequent memory access. Field Programmable Gate Arrays (FPGAs) can achieve low latency and high throughput CNN inference by implementing dataflow accelerators that pipeline layer-specific hardware to implement an entire network. By implementing a different processing element for each CNN layer, these layer-pipelined accelerators can achieve high compute density, but having all layers processing in parallel requires high memory bandwidth. Traditionally this has been satisfied by storing all weights on chip, but this is infeasible for the largest CNNs, which are often those most in need of acceleration. In this work we augment a state-of-the-art dataflow accelerator (HPIPE) to leverage both High-Bandwidth Memory (HBM) and on-chip storage, enabling high performance layer-pipelined dataflow…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Brain Tumor Detection and Classification
MethodsVGG-16
