NEURAghe: Exploiting CPU-FPGA Synergies for Efficient and Flexible CNN   Inference Acceleration on Zynq SoCs

Paolo Meloni; Alessandro Capotondi; Gianfranco Deriu; Michele Brian,; Francesco Conti; Davide Rossi; Luigi Raffo; Luca Benini

arXiv:1712.00994·cs.NE·November 28, 2019

NEURAghe: Exploiting CPU-FPGA Synergies for Efficient and Flexible CNN Inference Acceleration on Zynq SoCs

Paolo Meloni, Alessandro Capotondi, Gianfranco Deriu, Michele Brian,, Francesco Conti, Davide Rossi, Luigi Raffo, Luca Benini

PDF

TL;DR

NEURAghe is a flexible hardware/software platform that accelerates CNN inference on Zynq SoCs by combining a convolution-specific processor with ARM cores, achieving high performance and energy efficiency for real-time image recognition tasks.

Contribution

This work introduces NEURAghe, a novel heterogeneous computing platform that synergistically combines FPGA-based accelerators with ARM processors for efficient CNN inference.

Findings

01

Peak performance of 169 Gops/s

02

Energy efficiency of 17 Gops/W

03

Achieves 5.5 fps on VGG-16 and 6.6 fps on ResNet-18

Abstract

Deep convolutional neural networks (CNNs) obtain outstanding results in tasks that require human-level understanding of data, like image or speech recognition. However, their computational load is significant, motivating the development of CNN-specialized accelerators. This work presents NEURAghe, a flexible and efficient hardware/software solution for the acceleration of CNNs on Zynq SoCs. NEURAghe leverages the synergistic usage of Zynq ARM cores and of a powerful and flexible Convolution-Specific Processor deployed on the reconfigurable logic. The Convolution-Specific Processor embeds both a convolution engine and a programmable soft core, releasing the ARM processors from most of the supervision duties and allowing the accelerator to be controlled by software at an ultra-fine granularity. This methodology opens the way for cooperative heterogeneous computing: while the accelerator…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings