On the Effect of Task-to-Worker Assignment in Distributed Computing   Systems with Stragglers

Amir Behrouzi-Far; Emina Soljanin

arXiv:1808.02838·cs.DC·August 10, 2018

On the Effect of Task-to-Worker Assignment in Distributed Computing Systems with Stragglers

Amir Behrouzi-Far, Emina Soljanin

PDF

TL;DR

This paper analyzes how task-to-worker assignment strategies impact the expected completion time in distributed systems with stragglers, revealing that uniform replication of disjoint task batches minimizes latency.

Contribution

It demonstrates that both redundancy level and task assignment influence system latency, and identifies optimal uniform replication of disjoint batches for minimizing expected completion time.

Findings

01

Uniform replication of disjoint batches minimizes latency.

02

Task assignment strategies significantly affect system performance.

03

Redundancy and assignment choices are crucial for latency optimization.

Abstract

We study the expected completion time of some recently proposed algorithms for distributed computing which redundantly assign computing tasks to multiple machines in order to tolerate a certain number of machine failures. We analytically show that not only the amount of redundancy but also the task-to-machine assignments affect the latency in a distributed system. We study systems with a fixed number of computing tasks that are split in possibly overlapping batches, and independent exponentially distributed machine service times. We show that, for such systems, the uniform replication of non- overlapping (disjoint) batches of computing tasks achieves the minimum expected computing time.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.