Semi-Streaming Architecture: A New Design Paradigm for CNN Implementation on FPGAs
Nazariy K. Shaydyuk, Eugene B. John

TL;DR
This paper introduces a semi-streaming FPGA architecture for CNNs that combines layer-specific processing engines to improve efficiency and flexibility, demonstrated with an 8-bit MobileNetV2 implementation.
Contribution
It proposes a novel semi-streaming design paradigm that integrates specialized engines for different CNN layers, enhancing resource utilization and performance.
Findings
Achieved up to 89.6 GOp/s throughput for certain layers.
Energy efficiency of 5.32 GOp/s/W at 100MHz.
Implemented a flexible, layer-specific FPGA CNN accelerator.
Abstract
The recent research advances in deep learning have led to the development of small and powerful Convolutional Neural Network (CNN) architectures. Meanwhile Field Programmable Gate Arrays (FPGAs) has become a popular hardware target choice for their deployment, splitting into two main implementation categories: streaming hardware architectures and single computation engine design approaches. The streaming hardware architectures generally require implementing every layer as a discrete processing unit, and are suitable for smaller software models that could fit in their unfolded versions into resource-constrained targets. On the other hand, single computation engines can be scaled to fit into a device to execute CNN models of different sizes and complexities, however, the achievable performance of one-size-fits-all implementations may vary across CNNs with different workload attributes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · CCD and CMOS Imaging Sensors
