Optimization-based Block Coordinate Gradient Coding for Mitigating Partial Stragglers in Distributed Learning
Qi Wang, Ying Cui, Chenglin Li, Junni Zou, Hongkai Xiong

TL;DR
This paper introduces a novel gradient coding scheme tailored for distributed learning systems to effectively mitigate partial stragglers, optimizing runtime and completion probability through innovative discrete and continuous optimization methods.
Contribution
It proposes a coordinate gradient coding scheme with multiple diversities, transforming complex optimization problems into simpler forms and providing efficient algorithms for practical implementation.
Findings
The scheme effectively mitigates partial stragglers in distributed learning.
The proposed algorithms achieve near-optimal runtime and completion probability.
The method reduces computational complexity compared to existing approaches.
Abstract
Gradient coding schemes effectively mitigate full stragglers in distributed learning by introducing identical redundancy in coded local partial derivatives corresponding to all model parameters. However, they are no longer effective for partial stragglers as they cannot utilize incomplete computation results from partial stragglers. This paper aims to design a new gradient coding scheme for mitigating partial stragglers in distributed learning. Specifically, we consider a distributed system consisting of one master and N workers, characterized by a general partial straggler model and focuses on solving a general large-scale machine learning problem with L model parameters using gradient coding. First, we propose a coordinate gradient coding scheme with L coding parameters representing L possibly different diversities for the L coordinates, which generates most gradient coding schemes.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Domain Adaptation and Few-Shot Learning
