Dual-pronged deep learning preprocessing on heterogeneous platforms with CPU, Accelerator and CSD

Jia Wei; Xingjun Zhang; Witold Pedrycz; Longxiang Wang; Jie Zhao

arXiv:2407.00005·cs.DC·January 30, 2026·1 cites

Dual-pronged deep learning preprocessing on heterogeneous platforms with CPU, Accelerator and CSD

Jia Wei, Xingjun Zhang, Witold Pedrycz, Longxiang Wang, Jie Zhao

PDF

Open Access

TL;DR

This paper introduces DDLP, a novel deep learning preprocessing framework utilizing Computable Storage Devices to alleviate bottlenecks, improve speed, and reduce energy consumption across heterogeneous CPU, accelerator, and CSD platforms.

Contribution

DDLP is the first system to efficiently leverage CSDs for deep learning preprocessing, enabling parallel data handling and transfer, and balancing consistency and efficiency with adaptive strategies.

Findings

01

Improves ImageNet training speed by up to 23.5%.

02

Reduces energy consumption by 19.7%.

03

Decreases CPU and DRAM usage by 37.6%.

Abstract

For image-related deep learning tasks, the first step often involves reading data from external storage and performing preprocessing on the CPU. As accelerator speed increases and the number of single compute node accelerators increases, the computing and data transfer capabilities gap between accelerators and CPUs gradually increases. Data reading and preprocessing become progressively the bottleneck of these tasks. Our work, DDLP, addresses the data computing and transfer bottleneck of deep learning preprocessing using Computable Storage Devices (CSDs). DDLP allows the CPU and CSD to efficiently parallelize preprocessing from both ends of the datasets, respectively. To this end, we propose two adaptive dynamic selection strategies to make DDLP control the accelerator to automatically read data from different sources. The two strategies trade-off between consistency and efficiency.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Automated Systems