DPUV4E: High-Throughput DPU Architecture Design for CNN on Versal ACAP

Guoyu Li; Pengbo Zheng; Jian Weng; Enshan Yang (AMD)

arXiv:2506.11441·cs.AR·June 16, 2025

DPUV4E: High-Throughput DPU Architecture Design for CNN on Versal ACAP

Guoyu Li, Pengbo Zheng, Jian Weng, Enshan Yang (AMD)

PDF

Open Access

TL;DR

This paper introduces DPUV4E, a high-throughput DPU architecture for CNN acceleration on AMD's Versal ACAP, achieving significant improvements in efficiency, resource utilization, and inference throughput.

Contribution

The paper presents a novel DPU design tailored for Versal ACAP, supporting diverse configurations and computation units, and extends functionality to non-convolutional operations, enhancing performance and resource efficiency.

Findings

01

8.6× TOPS/W compared to traditional FPGA DPUs

02

95.8% reduction in DSP usage

03

Up to 2.2× throughput improvement for depth-wise convolution models

Abstract

Convolutional Neural Networks (CNNs) remain prevalent in computer vision applications, and FPGAs, known for their flexibility and energy efficiency, have become essential components in heterogeneous acceleration systems. However, traditional FPGAs face challenges in balancing performance and versatility due to limited on-chip resources. AMD's Versal ACAP architecture, tailored for AI applications, incorporates AI Engines (AIEs) to deliver high computational power. Nevertheless, the platform suffers from insufficient memory bandwidth, hindering the full utilization of the AIEs' theoretical performance. In this paper, we present DPUV4E for the Versal architecture, providing configurations ranging from 2PE ( $32.6$ TOPS) to 8PE ( $131.0$ TOPS). We design two computation units, Conv PE and DWC PE, to support different computational patterns. Each computation unit's data flow efficiently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Medical Imaging Techniques and Applications

MethodsConvolution