Weighted parallel SGD for distributed unbalanced-workload training system
Cheng Daning, Li Shigang, Zhang Yunquan

TL;DR
This paper introduces WP-SGD, a weighted parallel stochastic gradient descent algorithm designed for heterogeneous distributed systems, which improves training efficiency by compensating for unbalanced node performance without requiring equal data consumption.
Contribution
The paper proposes WP-SGD, a novel weighted aggregation method for parallel SGD that handles unbalanced workloads in heterogeneous environments, with theoretical analysis and experimental validation.
Findings
WP-SGD outperforms traditional parallel SGD in unbalanced systems.
WP-SGD reduces the impact of low-performance nodes on training accuracy.
Experimental results confirm improved efficiency in heterogeneous distributed training.
Abstract
Stochastic gradient descent (SGD) is a popular stochastic optimization method in machine learning. Traditional parallel SGD algorithms, e.g., SimuParallel SGD, often require all nodes to have the same performance or to consume equal quantities of data. However, these requirements are difficult to satisfy when the parallel SGD algorithms run in a heterogeneous computing environment; low-performance nodes will exert a negative influence on the final result. In this paper, we propose an algorithm called weighted parallel SGD (WP-SGD). WP-SGD combines weighted model parameters from different nodes in the system to produce the final output. WP-SGD makes use of the reduction in standard deviation to compensate for the loss from the inconsistency in performance of nodes in the cluster, which means that WP-SGD does not require that all nodes consume equal quantities of data. We also analyze the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Advanced Neural Network Applications
MethodsStochastic Gradient Descent
