unzipFPGA: Enhancing FPGA-based CNN Engines with On-the-Fly Weights   Generation

Stylianos I. Venieris; Javier Fernandez-Marques; Nicholas D. Lane

arXiv:2103.05600·cs.CV·April 6, 2021

unzipFPGA: Enhancing FPGA-based CNN Engines with On-the-Fly Weights Generation

Stylianos I. Venieris, Javier Fernandez-Marques, Nicholas D. Lane

PDF

TL;DR

This paper introduces unzipFPGA, a framework that enhances FPGA-based CNN engines by enabling on-the-fly weight generation, significantly improving performance and resource utilization for memory-bound layers.

Contribution

It proposes a novel hardware component and processing element design for on-the-fly weights generation, optimizing FPGA CNN performance under bandwidth constraints.

Findings

01

Achieves an average speedup of 2.14x over optimized CNN engines.

02

Attains 71% improvement over pruned CNN engines.

03

Reaches up to 3.69x higher performance density than state-of-the-art accelerators.

Abstract

Single computation engines have become a popular design choice for FPGA-based convolutional neural networks (CNNs) enabling the deployment of diverse models without fabric reconfiguration. This flexibility, however, often comes with significantly reduced performance on memory-bound layers and resource underutilisation due to suboptimal mapping of certain layers on the engine's fixed configuration. In this work, we investigate the implications in terms of CNN engine design for a class of models that introduce a pre-convolution stage to decompress the weights at run time. We refer to these approaches as on-the-fly. To minimise the negative impact of limited bandwidth on memory-bound layers, we present a novel hardware component that enables the on-chip on-the-fly generation of weights. We further introduce an input selective processing element (PE) design that balances the load between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.