DPUV4E: High-Throughput DPU Architecture Design for CNN on Versal ACAP
Guoyu Li, Pengbo Zheng, Jian Weng, Enshan Yang (AMD)

TL;DR
This paper introduces DPUV4E, a high-throughput DPU architecture for CNN acceleration on AMD's Versal ACAP, achieving significant improvements in efficiency, resource utilization, and inference throughput.
Contribution
The paper presents a novel DPU design tailored for Versal ACAP, supporting diverse configurations and computation units, and extends functionality to non-convolutional operations, enhancing performance and resource efficiency.
Findings
8.6× TOPS/W compared to traditional FPGA DPUs
95.8% reduction in DSP usage
Up to 2.2× throughput improvement for depth-wise convolution models
Abstract
Convolutional Neural Networks (CNNs) remain prevalent in computer vision applications, and FPGAs, known for their flexibility and energy efficiency, have become essential components in heterogeneous acceleration systems. However, traditional FPGAs face challenges in balancing performance and versatility due to limited on-chip resources. AMD's Versal ACAP architecture, tailored for AI applications, incorporates AI Engines (AIEs) to deliver high computational power. Nevertheless, the platform suffers from insufficient memory bandwidth, hindering the full utilization of the AIEs' theoretical performance. In this paper, we present DPUV4E for the Versal architecture, providing configurations ranging from 2PE ( TOPS) to 8PE ( TOPS). We design two computation units, Conv PE and DWC PE, to support different computational patterns. Each computation unit's data flow efficiently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Medical Imaging Techniques and Applications
MethodsConvolution
