Characterizing Deep-Learning I/O Workloads in TensorFlow
Steven W. D. Chien, Stefano Markidis, Chaitanya Prasad Sishtla, Luis, Santos, Pawel Herman, Sai Narasimhamurthy, Erwin Laure

TL;DR
This paper analyzes TensorFlow's I/O performance, identifies bottlenecks, and proposes a burst buffer solution to significantly improve checkpointing efficiency and overall training performance.
Contribution
It provides a detailed characterization of TensorFlow's I/O behavior and introduces a burst buffer method to enhance checkpointing performance.
Findings
Increasing threads boosts bandwidth up to 7.8x.
Prefetching overlaps computation and I/O, eliminating I/O costs.
Burst buffer improves checkpointing speed by 2.6x.
Abstract
The performance of Deep-Learning (DL) computing frameworks rely on the performance of data ingestion and checkpointing. In fact, during the training, a considerable high number of relatively small files are first loaded and pre-processed on CPUs and then moved to accelerator for computation. In addition, checkpointing and restart operations are carried out to allow DL computing frameworks to restart quickly from a checkpoint. Because of this, I/O affects the performance of DL applications. In this work, we characterize the I/O performance and scaling of TensorFlow, an open-source programming framework developed by Google and specifically designed for solving DL problems. To measure TensorFlow I/O performance, we first design a micro-benchmark to measure TensorFlow reads, and then use a TensorFlow mini-application based on AlexNet to measure the performance cost of I/O and checkpointing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
