Optimization-based Block Coordinate Gradient Coding for Mitigating   Partial Stragglers in Distributed Learning

Qi Wang; Ying Cui; Chenglin Li; Junni Zou; Hongkai Xiong

arXiv:2206.02450·cs.IT·April 26, 2023

Optimization-based Block Coordinate Gradient Coding for Mitigating Partial Stragglers in Distributed Learning

Qi Wang, Ying Cui, Chenglin Li, Junni Zou, Hongkai Xiong

PDF

Open Access

TL;DR

This paper introduces a novel gradient coding scheme tailored for distributed learning systems to effectively mitigate partial stragglers, optimizing runtime and completion probability through innovative discrete and continuous optimization methods.

Contribution

It proposes a coordinate gradient coding scheme with multiple diversities, transforming complex optimization problems into simpler forms and providing efficient algorithms for practical implementation.

Findings

01

The scheme effectively mitigates partial stragglers in distributed learning.

02

The proposed algorithms achieve near-optimal runtime and completion probability.

03

The method reduces computational complexity compared to existing approaches.

Abstract

Gradient coding schemes effectively mitigate full stragglers in distributed learning by introducing identical redundancy in coded local partial derivatives corresponding to all model parameters. However, they are no longer effective for partial stragglers as they cannot utilize incomplete computation results from partial stragglers. This paper aims to design a new gradient coding scheme for mitigating partial stragglers in distributed learning. Specifically, we consider a distributed system consisting of one master and N workers, characterized by a general partial straggler model and focuses on solving a general large-scale machine learning problem with L model parameters using gradient coding. First, we propose a coordinate gradient coding scheme with L coding parameters representing L possibly different diversities for the L coordinates, which generates most gradient coding schemes.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Domain Adaptation and Few-Shot Learning