DPU or GPU for Accelerating Neural Networks Inference -- Why not both? Split CNN Inference

Ali Emre Oztas; Mahir Demir; James Garside; and Mikel Luj'an

arXiv:2605.00174·cs.AR·May 4, 2026

DPU or GPU for Accelerating Neural Networks Inference -- Why not both? Split CNN Inference

Ali Emre Oztas, Mahir Demir, James Garside, and Mikel Luj'an

PDF

TL;DR

This paper proposes a split CNN inference method combining DPU and GPU to reduce latency in edge device video/image streaming, with a GNN-based partition prediction achieving significant performance gains.

Contribution

It introduces a novel partitioning approach for CNN inference across DPU and GPU, including an automated GNN-based prediction method for optimal layer splitting.

Findings

01

Up to 2.48x latency reduction over DPU-only execution.

02

Up to 3.37x latency reduction over GPU-only execution.

03

GNN-based partition prediction achieves 96.27% accuracy.

Abstract

Video and image streaming on edge devices requires low latency. To address this, Neural Networks (NNs) are widely used, and prior work mainly focuses on accelerating them with single hardware units such as Graphics Processing Units (GPUs), Field Programmable Gate Arrays (FPGAs), and Deep Learning Processing Units (DPUs). However, further reductions in latency can be observed by combining these units. In this paper, partitioning CNN inference across DPU and GPU (Split CNN Inference) is proposed. The first partition runs on the AI engines (DPU) of a Versal VCK190, which consists of initial CNN layers processing the input images. The DPU processes the first partition near the source of the data. Pipelined asynchronously, a GPU runs the remaining layers. The GPU (NVIDIA RTX 2080) processes the second partition, albeit having reduced the data transfer between the data source (storage/camera)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.