Redundancy Techniques for Straggler Mitigation in Distributed Optimization and Learning
Can Karakus, Yifan Sun, Suhas Diggavi, Wotao Yin

TL;DR
This paper introduces a redundancy-based distributed optimization framework that mitigates straggler effects by encoding data with over-complete representations, ensuring convergence despite delays and node failures.
Contribution
It proposes a novel encoding scheme using equiangular tight frames and demonstrates convergence guarantees for various optimization algorithms under arbitrary delay patterns.
Findings
Redundancy encoding improves robustness against stragglers.
The method converges deterministically regardless of delay distributions.
Experimental results show performance gains over traditional strategies.
Abstract
Performance of distributed optimization and learning systems is bottlenecked by "straggler" nodes and slow communication links, which significantly delay computation. We propose a distributed optimization framework where the dataset is "encoded" to have an over-complete representation with built-in redundancy, and the straggling nodes in the system are dynamically left out of the computation at every iteration, whose loss is compensated by the embedded redundancy. We show that oblivious application of several popular optimization algorithms on encoded data, including gradient descent, L-BFGS, proximal gradient under data parallelism, and coordinate descent under model parallelism, converge to either approximate or exact solutions of the original problem when stragglers are treated as erasures. These convergence results are deterministic, i.e., they establish sample path convergence for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Machine Learning and ELM
