CrossoverScheduler: Overlapping Multiple Distributed Training Applications in a Crossover Manner
Cheng Luo, Lei Qu, Youshan Miao, Peng Cheng, Yongqiang Xiong

TL;DR
CrossoverScheduler is a novel algorithm that enables multiple distributed deep learning tasks to share GPU resources through pipelined communication and computation, significantly improving training speed without affecting accuracy.
Contribution
It introduces Crossover Synchronization, allowing multiple applications to time-share GPUs, and demonstrates its effectiveness with a prototype integrated into Horovod.
Findings
Achieves 20% speedup on ImageNet classification
Enables overlapping multiple training applications without accuracy loss
Improves distributed training throughput significantly
Abstract
Distributed deep learning workloads include throughput-intensive training tasks on the GPU clusters, where the Distributed Stochastic Gradient Descent (SGD) incurs significant communication delays after backward propagation, forces workers to wait for the gradient synchronization via a centralized parameter server or directly in decentralized workers. We present CrossoverScheduler, an algorithm that enables communication cycles of a distributed training application to be filled by other applications through pipelining communication and computation. With CrossoverScheduler, the running performance of distributed training can be significantly improved without sacrificing convergence rate and network accuracy. We achieve so by introducing Crossover Synchronization which allows multiple distributed deep learning applications to time-share the same GPU alternately. The prototype of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Brain Tumor Detection and Classification
