Weighted parallel SGD for distributed unbalanced-workload training   system

Cheng Daning; Li Shigang; Zhang Yunquan

arXiv:1708.04801·cs.LG·August 17, 2017·1 cites

Weighted parallel SGD for distributed unbalanced-workload training system

Cheng Daning, Li Shigang, Zhang Yunquan

PDF

Open Access

TL;DR

This paper introduces WP-SGD, a weighted parallel stochastic gradient descent algorithm designed for heterogeneous distributed systems, which improves training efficiency by compensating for unbalanced node performance without requiring equal data consumption.

Contribution

The paper proposes WP-SGD, a novel weighted aggregation method for parallel SGD that handles unbalanced workloads in heterogeneous environments, with theoretical analysis and experimental validation.

Findings

01

WP-SGD outperforms traditional parallel SGD in unbalanced systems.

02

WP-SGD reduces the impact of low-performance nodes on training accuracy.

03

Experimental results confirm improved efficiency in heterogeneous distributed training.

Abstract

Stochastic gradient descent (SGD) is a popular stochastic optimization method in machine learning. Traditional parallel SGD algorithms, e.g., SimuParallel SGD, often require all nodes to have the same performance or to consume equal quantities of data. However, these requirements are difficult to satisfy when the parallel SGD algorithms run in a heterogeneous computing environment; low-performance nodes will exert a negative influence on the final result. In this paper, we propose an algorithm called weighted parallel SGD (WP-SGD). WP-SGD combines weighted model parameters from different nodes in the system to produce the final output. WP-SGD makes use of the reduction in standard deviation to compensate for the loss from the inconsistency in performance of nodes in the cluster, which means that WP-SGD does not require that all nodes consume equal quantities of data. We also analyze the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Advanced Neural Network Applications

MethodsStochastic Gradient Descent