DSAG: A mixed synchronous-asynchronous iterative method for   straggler-resilient learning

Albin Severinson; Eirik Rosnes; Salim El Rouayheb; Alexandre Graell i; Amat

arXiv:2111.13877·cs.DC·November 30, 2021

DSAG: A mixed synchronous-asynchronous iterative method for straggler-resilient learning

Albin Severinson, Eirik Rosnes, Salim El Rouayheb, Alexandre Graell i, Amat

PDF

Open Access 1 Repo

TL;DR

This paper introduces DSAG, a novel mixed synchronous-asynchronous iterative method for resilient distributed learning that effectively handles persistent stragglers, demonstrating significant speed improvements over existing methods in practical cloud computing scenarios.

Contribution

The paper proposes DSAG, a new iterative optimization algorithm combining synchronous and asynchronous updates, with a latency model and load-balancing strategy tailored for persistent stragglers.

Findings

01

DSAG is up to 50% faster than SAG.

02

DSAG is more than twice as fast as coded computing methods.

03

Effective in large-scale genomics and logistic regression tasks.

Abstract

We consider straggler-resilient learning. In many previous works, e.g., in the coded computing literature, straggling is modeled as random delays that are independent and identically distributed between workers. However, in many practical scenarios, a given worker may straggle over an extended period of time. We propose a latency model that captures this behavior and is substantiated by traces collected on Microsoft Azure, Amazon Web Services (AWS), and a small local cluster. Building on this model, we propose DSAG, a mixed synchronous-asynchronous iterative optimization method, based on the stochastic average gradient (SAG) method, that combines timely and stale results. We also propose a dynamic load-balancing strategy to further reduce the impact of straggling workers. We evaluate DSAG for principal component analysis, cast as a finite-sum optimization problem, of a large genomics…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

severinson/dsag-paper
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Gene expression and cancer classification · Stochastic Gradient Optimization Techniques