Synergy: A HW/SW Framework for High Throughput CNNs on Embedded Heterogeneous SoC
Guanwen Zhong, Akshat Dubey, Tan Cheng, Tulika Mitra

TL;DR
Synergy is a hardware-software co-designed framework that enables high-throughput, energy-efficient CNN inference on embedded heterogeneous SoCs by leveraging multi-threading and adaptive workload balancing across FPGA and NEON accelerators.
Contribution
It introduces a unified, adaptable framework for CNN inference on embedded SoCs that efficiently utilizes all on-chip resources without hardware modifications.
Findings
Achieves 7.3X speedup over software-only solutions
Demonstrates superior throughput and energy efficiency
Supports runtime adaptation to different CNN configurations
Abstract
Convolutional Neural Networks (CNN) have been widely deployed in diverse application domains. There has been significant progress in accelerating both their training and inference using high-performance GPUs, FPGAs, and custom ASICs for datacenter-scale environments. The recent proliferation of mobile and IoT devices have necessitated real-time, energy-efficient deep neural network inference on embedded-class, resource-constrained platforms. In this context, we present {\em Synergy}, an automated, hardware-software co-designed, pipelined, high-throughput CNN inference framework on embedded heterogeneous system-on-chip (SoC) architectures (Xilinx Zynq). {\em Synergy} leverages, through multi-threading, all the available on-chip resources, which includes the dual-core ARM processor along with the FPGA and the NEON SIMD engines as accelerators. Moreover, {\em Synergy} provides a unified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
