TicTac: Accelerating Distributed Deep Learning with Communication   Scheduling

Sayed Hadi Hashemi; Sangeetha Abdu Jyothi; Roy H. Campbell

arXiv:1803.03288·cs.DC·October 5, 2018·26 cites

TicTac: Accelerating Distributed Deep Learning with Communication Scheduling

Sayed Hadi Hashemi, Sangeetha Abdu Jyothi, Roy H. Campbell

PDF

Open Access 1 Repo

TL;DR

TicTac is a system that optimizes communication scheduling in distributed deep learning, reducing iteration time and stragglers by enforcing transfer order without requiring model changes.

Contribution

It introduces a novel communication scheduling method for distributed training that guarantees near-optimal overlap and improves performance without modifying models.

Findings

01

Up to 37.7% throughput increase in inference

02

Up to 19.2% throughput increase in training

03

Straggler effects reduced by up to 2.3 times

Abstract

State-of-the-art deep learning systems rely on iterative distributed training to tackle the increasing complexity of models and input data. The iteration time in these communication-heavy systems depends on the computation time, communication time and the extent of overlap of computation and communication. In this work, we identify a shortcoming in systems with graph representation for computation, such as TensorFlow and PyTorch, that result in high variance in iteration time --- random order of received parameters across workers. We develop a system, TicTac, to improve the iteration time by fixing this issue in distributed deep learning with Parameter Servers while guaranteeing near-optimal overlap of communication and computation. TicTac identifies and enforces an order of network transfers which improves the iteration time using prioritization. Our system is implemented over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xldrx/tictac
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data