Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights Generation
Stylianos I. Venieris, Javier Fernandez-Marques, Nicholas D. Lane

TL;DR
This paper introduces unzipFPGA, a novel CNN inference system with on-chip weights generation to mitigate memory bottlenecks, achieving significant performance improvements over existing FPGA and GPU solutions.
Contribution
The work presents a new hardware architecture with on-the-fly weights generation, an automated hardware-aware methodology, and input selective processing to enhance CNN inference efficiency.
Findings
2.57x performance efficiency gain over optimized GPUs.
Up to 3.94x higher performance density than existing FPGA accelerators.
Effective mitigation of memory bandwidth limitations in CNN layers.
Abstract
The unprecedented accuracy of convolutional neural networks (CNNs) across a broad range of AI tasks has led to their widespread deployment in mobile and embedded settings. In a pursuit for high-performance and energy-efficient inference, significant research effort has been invested in the design of FPGA-based CNN accelerators. In this context, single computation engines constitute a popular approach to support diverse CNN modes without the overhead of fabric reconfiguration. Nevertheless, this flexibility often comes with significantly degraded performance on memory-bound layers and resource underutilisation due to the suboptimal mapping of certain layers on the engine's fixed configuration. In this work, we investigate the implications in terms of CNN engine design for a class of models that introduce a pre-convolution stage to decompress the weights at run time. We refer to these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Brain Tumor Detection and Classification
