Accelerating Data Loading in Deep Neural Network Training
Chih-Chieh Yang, Guojing Cong

TL;DR
This paper investigates and improves data loading performance in large-scale distributed deep neural network training, achieving over 30x speedup by optimizing CPU utilization and reducing communication overhead.
Contribution
It introduces a locality-aware data loading method utilizing software caches, addressing scalability issues and significantly enhancing data loading efficiency in distributed training.
Findings
Over 30x speedup in data loading with 256 nodes
Performance limited by I/O rate as scale increases
Locality-aware data loading reduces communication volume
Abstract
Data loading can dominate deep neural network training time on large-scale systems. We present a comprehensive study on accelerating data loading performance in large-scale distributed training. We first identify performance and scalability issues in current data loading implementations. We then propose optimizations that utilize CPU resources to the data loader design. We use an analytical model to characterize the impact of data loading on the overall training time and establish the performance trend as we scale up distributed training. Our model suggests that I/O rate limits the scalability of distributed training, which inspires us to design a locality-aware data loading method. By utilizing software caches, our method can drastically reduce the data loading communication volume in comparison with the original data loading implementation. Finally, we evaluate the proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
