Sequential Gradient Coding For Straggler Mitigation

M. Nikhil Krishnan; MohammadReza Ebrahimi; Ashish Khisti

arXiv:2211.13802·cs.LG·June 29, 2023

Sequential Gradient Coding For Straggler Mitigation

M. Nikhil Krishnan, MohammadReza Ebrahimi, Ashish Khisti

PDF

Open Access 1 Video

TL;DR

This paper introduces two advanced gradient coding schemes that leverage temporal information and selective repetition to mitigate stragglers more effectively in distributed neural network training, achieving significant runtime improvements.

Contribution

The main contribution is a novel gradient coding scheme that combines coding and repetition, exploiting temporal dynamics for better straggler mitigation in distributed computing.

Findings

01

Achieved up to 16% reduction in runtime over baseline GC.

02

Demonstrated effectiveness in a practical AWS Lambda cluster setting.

03

Improved straggler mitigation through adaptive task multiplexing.

Abstract

In distributed computing, slower nodes (stragglers) usually become a bottleneck. Gradient Coding (GC), introduced by Tandon et al., is an efficient technique that uses principles of error-correcting codes to distribute gradient computation in the presence of stragglers. In this paper, we consider the distributed computation of a sequence of gradients ${g (1), g (2), \dots, g (J)}$ , where processing of each gradient $g (t)$ starts in round- $t$ and finishes by round- $(t + T)$ . Here $T \geq 0$ denotes a delay parameter. For the GC scheme, coding is only across computing nodes and this results in a solution where $T = 0$ . On the other hand, having $T > 0$ allows for designing schemes which exploit the temporal dimension as well. In this work, we propose two schemes that demonstrate improved performance compared to GC. Our first scheme combines GC with selective repetition of previously unfinished…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Sequential Gradient Coding For Straggler Mitigation· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Ferroelectric and Negative Capacitance Devices