Task allocation for decentralized training in heterogeneous environment
Yongyue Chao, Mingxue Liao, Jiaxin Gao

TL;DR
This paper proposes a static and self-adaptive data allocation algorithm for decentralized training in heterogeneous environments, significantly reducing training time and improving resource utilization.
Contribution
It introduces a novel self-adaptive data allocation method that enhances decentralized training efficiency in heterogeneous clusters, outperforming proportional allocation.
Findings
Self-adaptive allocation reduces training time by nearly 50%.
Algorithm improves resource utilization in heterogeneous environments.
Training time decreases with better overall cluster performance.
Abstract
The demand for large-scale deep learning is increasing, and distributed training is the current mainstream solution. Ring AllReduce is widely used as a data parallel decentralized algorithm. However, in a heterogeneous environment, each worker calculates the same amount of data, so that there is a lot of waiting time loss among different workers, which makes the algorithm unable to adapt well to heterogeneous clusters. Resources are not used as they should be. In this paper, we design an implementation of static allocation algorithm. The dataset is artificially allocated to each worker, and samples are drawn proportionally for training, thereby speeding up the training speed of the network in a heterogeneous environment. We verify the convergence and influence on training speed of the network model under this algorithm on one machine with multi-card and multi-machine with multi-card. On…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Privacy-Preserving Technologies in Data
