Throughput Prediction of Asynchronous SGD in TensorFlow
Zhuojin Li, Wumo Yan, Marco Paolieri, Leana Golubchik

TL;DR
This paper presents a method to predict the throughput of asynchronous SGD in TensorFlow by modeling interactions and scheduling in multi-node configurations, enabling better performance estimation and optimization.
Contribution
It introduces a novel approach to predict training throughput from single-node profiling, accounting for communication and computation overlaps in multi-node setups.
Findings
Accurately predicts throughput on GPU and CPU clusters.
Validates predictions on AWS and in-house clusters.
Analyzes effects of data transmission policies.
Abstract
Modern machine learning frameworks can train neural networks using multiple nodes in parallel, each computing parameter updates with stochastic gradient descent (SGD) and sharing them asynchronously through a central parameter server. Due to communication overhead and bottlenecks, the total throughput of SGD updates in a cluster scales sublinearly, saturating as the number of nodes increases. In this paper, we present a solution to predicting training throughput from profiling traces collected from a single-node configuration. Our approach is able to model the interaction of multiple nodes and the scheduling of concurrent transmissions between the parameter server and each node. By accounting for the dependencies between received parts and pending computations, we predict overlaps between computation and communication and generate synthetic execution traces for configurations with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsStochastic Gradient Descent
