Accelerating Data Loading in Deep Neural Network Training

Chih-Chieh Yang; Guojing Cong

arXiv:1910.01196·cs.LG·February 20, 2020

Accelerating Data Loading in Deep Neural Network Training

Chih-Chieh Yang, Guojing Cong

PDF

TL;DR

This paper investigates and improves data loading performance in large-scale distributed deep neural network training, achieving over 30x speedup by optimizing CPU utilization and reducing communication overhead.

Contribution

It introduces a locality-aware data loading method utilizing software caches, addressing scalability issues and significantly enhancing data loading efficiency in distributed training.

Findings

01

Over 30x speedup in data loading with 256 nodes

02

Performance limited by I/O rate as scale increases

03

Locality-aware data loading reduces communication volume

Abstract

Data loading can dominate deep neural network training time on large-scale systems. We present a comprehensive study on accelerating data loading performance in large-scale distributed training. We first identify performance and scalability issues in current data loading implementations. We then propose optimizations that utilize CPU resources to the data loader design. We use an analytical model to characterize the impact of data loading on the overall training time and establish the performance trend as we scale up distributed training. Our model suggests that I/O rate limits the scalability of distributed training, which inspires us to design a locality-aware data loading method. By utilizing software caches, our method can drastically reduce the data loading communication volume in comparison with the original data loading implementation. Finally, we evaluate the proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.