DSAG: A mixed synchronous-asynchronous iterative method for straggler-resilient learning
Albin Severinson, Eirik Rosnes, Salim El Rouayheb, Alexandre Graell i, Amat

TL;DR
This paper introduces DSAG, a novel mixed synchronous-asynchronous iterative method for resilient distributed learning that effectively handles persistent stragglers, demonstrating significant speed improvements over existing methods in practical cloud computing scenarios.
Contribution
The paper proposes DSAG, a new iterative optimization algorithm combining synchronous and asynchronous updates, with a latency model and load-balancing strategy tailored for persistent stragglers.
Findings
DSAG is up to 50% faster than SAG.
DSAG is more than twice as fast as coded computing methods.
Effective in large-scale genomics and logistic regression tasks.
Abstract
We consider straggler-resilient learning. In many previous works, e.g., in the coded computing literature, straggling is modeled as random delays that are independent and identically distributed between workers. However, in many practical scenarios, a given worker may straggle over an extended period of time. We propose a latency model that captures this behavior and is substantiated by traces collected on Microsoft Azure, Amazon Web Services (AWS), and a small local cluster. Building on this model, we propose DSAG, a mixed synchronous-asynchronous iterative optimization method, based on the stochastic average gradient (SAG) method, that combines timely and stale results. We also propose a dynamic load-balancing strategy to further reduce the impact of straggling workers. We evaluate DSAG for principal component analysis, cast as a finite-sum optimization problem, of a large genomics…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Gene expression and cancer classification · Stochastic Gradient Optimization Techniques
