Throughput Prediction of Asynchronous SGD in TensorFlow

Zhuojin Li; Wumo Yan; Marco Paolieri; Leana Golubchik

arXiv:1911.04650·cs.DC·March 2, 2020

Throughput Prediction of Asynchronous SGD in TensorFlow

Zhuojin Li, Wumo Yan, Marco Paolieri, Leana Golubchik

PDF

TL;DR

This paper presents a method to predict the throughput of asynchronous SGD in TensorFlow by modeling interactions and scheduling in multi-node configurations, enabling better performance estimation and optimization.

Contribution

It introduces a novel approach to predict training throughput from single-node profiling, accounting for communication and computation overlaps in multi-node setups.

Findings

01

Accurately predicts throughput on GPU and CPU clusters.

02

Validates predictions on AWS and in-house clusters.

03

Analyzes effects of data transmission policies.

Abstract

Modern machine learning frameworks can train neural networks using multiple nodes in parallel, each computing parameter updates with stochastic gradient descent (SGD) and sharing them asynchronously through a central parameter server. Due to communication overhead and bottlenecks, the total throughput of SGD updates in a cluster scales sublinearly, saturating as the number of nodes increases. In this paper, we present a solution to predicting training throughput from profiling traces collected from a single-node configuration. Our approach is able to model the interaction of multiple nodes and the scheduling of concurrent transmissions between the parameter server and each node. By accounting for the dependencies between received parts and pending computations, we predict overlaps between computation and communication and generate synthetic execution traces for configurations with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsStochastic Gradient Descent