NEURAghe: Exploiting CPU-FPGA Synergies for Efficient and Flexible CNN Inference Acceleration on Zynq SoCs
Paolo Meloni, Alessandro Capotondi, Gianfranco Deriu, Michele Brian,, Francesco Conti, Davide Rossi, Luigi Raffo, Luca Benini

TL;DR
NEURAghe is a flexible hardware/software platform that accelerates CNN inference on Zynq SoCs by combining a convolution-specific processor with ARM cores, achieving high performance and energy efficiency for real-time image recognition tasks.
Contribution
This work introduces NEURAghe, a novel heterogeneous computing platform that synergistically combines FPGA-based accelerators with ARM processors for efficient CNN inference.
Findings
Peak performance of 169 Gops/s
Energy efficiency of 17 Gops/W
Achieves 5.5 fps on VGG-16 and 6.6 fps on ResNet-18
Abstract
Deep convolutional neural networks (CNNs) obtain outstanding results in tasks that require human-level understanding of data, like image or speech recognition. However, their computational load is significant, motivating the development of CNN-specialized accelerators. This work presents NEURAghe, a flexible and efficient hardware/software solution for the acceleration of CNNs on Zynq SoCs. NEURAghe leverages the synergistic usage of Zynq ARM cores and of a powerful and flexible Convolution-Specific Processor deployed on the reconfigurable logic. The Convolution-Specific Processor embeds both a convolution engine and a programmable soft core, releasing the ARM processors from most of the supervision duties and allowing the accelerator to be controlled by software at an ultra-fine granularity. This methodology opens the way for cooperative heterogeneous computing: while the accelerator…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
