Communication-Efficient Local SGD with Age-Based Worker Selection

Feng Zhu; Jingjing Zhang; Xin Wang

arXiv:2210.17073·cs.IT·January 2, 2023

Communication-Efficient Local SGD with Age-Based Worker Selection

Feng Zhu, Jingjing Zhang, Xin Wang

PDF

Open Access

TL;DR

This paper introduces AgeSel, an age-based worker selection strategy for distributed local SGD that improves communication efficiency and convergence speed by balancing worker participation based on their ages.

Contribution

The paper proposes a novel age-based worker selection method, AgeSel, which enhances communication efficiency and convergence in distributed local SGD with heterogeneous data.

Findings

01

AgeSel reduces training rounds to reach target accuracy.

02

It significantly cuts communication costs.

03

The hyper-parameter influences the effectiveness of worker selection.

Abstract

A major bottleneck of distributed learning under parameter-server (PS) framework is communication cost due to frequent bidirectional transmissions between the PS and workers. To address this issue, local stochastic gradient descent (SGD) and worker selection have been exploited by reducing the communication frequency and the number of participating workers at each round, respectively. However, partial participation can be detrimental to convergence rate, especially for heterogeneous local datasets. In this paper, to improve communication efficiency and speed up the training process, we develop a novel worker selection strategy named AgeSel. The key enabler of AgeSel is utilization of the ages of workers to balance their participation frequencies. The convergence of local SGD with the proposed age-based partial worker participation is rigorously established. Simulation results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Privacy-Preserving Technologies in Data