ErasureHead: Distributed Gradient Descent without Delays Using Approximate Gradient Coding
Hongyi Wang, Zachary Charles, Dimitris Papailiopoulos

TL;DR
ErasureHead introduces an approximate gradient coding method for distributed gradient descent, enabling faster convergence with delay tolerance and demonstrating significant speedups over traditional and coded approaches.
Contribution
It proposes a novel approximate gradient coding technique that improves delay tolerance and convergence speed in distributed gradient descent.
Findings
Converges as quickly as standard GD up to a small noise floor.
Achieves faster overall runtime under probabilistic delay models.
Demonstrates significant speedups in real-world experiments.
Abstract
We present ErasureHead, a new approach for distributed gradient descent (GD) that mitigates system delays by employing approximate gradient coding. Gradient coded distributed GD uses redundancy to exactly recover the gradient at each iteration from a subset of compute nodes. ErasureHead instead uses approximate gradient codes to recover an inexact gradient at each iteration, but with higher delay tolerance. Unlike prior work on gradient coding, we provide a performance analysis that combines both delay and convergence guarantees. We establish that down to a small noise floor, ErasureHead converges as quickly as distributed GD and has faster overall runtime under a probabilistic delay model. We conduct extensive experiments on real world datasets and distributed clusters and demonstrate that our method can lead to significant speedups over both standard and gradient coded GD.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data
