Dual-pronged deep learning preprocessing on heterogeneous platforms with CPU, Accelerator and CSD
Jia Wei, Xingjun Zhang, Witold Pedrycz, Longxiang Wang, Jie Zhao

TL;DR
This paper introduces DDLP, a novel deep learning preprocessing framework utilizing Computable Storage Devices to alleviate bottlenecks, improve speed, and reduce energy consumption across heterogeneous CPU, accelerator, and CSD platforms.
Contribution
DDLP is the first system to efficiently leverage CSDs for deep learning preprocessing, enabling parallel data handling and transfer, and balancing consistency and efficiency with adaptive strategies.
Findings
Improves ImageNet training speed by up to 23.5%.
Reduces energy consumption by 19.7%.
Decreases CPU and DRAM usage by 37.6%.
Abstract
For image-related deep learning tasks, the first step often involves reading data from external storage and performing preprocessing on the CPU. As accelerator speed increases and the number of single compute node accelerators increases, the computing and data transfer capabilities gap between accelerators and CPUs gradually increases. Data reading and preprocessing become progressively the bottleneck of these tasks. Our work, DDLP, addresses the data computing and transfer bottleneck of deep learning preprocessing using Computable Storage Devices (CSDs). DDLP allows the CPU and CSD to efficiently parallelize preprocessing from both ends of the datasets, respectively. To this end, we propose two adaptive dynamic selection strategies to make DDLP control the accelerator to automatically read data from different sources. The two strategies trade-off between consistency and efficiency.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Automated Systems
