On the Effect of Task-to-Worker Assignment in Distributed Computing Systems with Stragglers
Amir Behrouzi-Far, Emina Soljanin

TL;DR
This paper analyzes how task-to-worker assignment strategies impact the expected completion time in distributed systems with stragglers, revealing that uniform replication of disjoint task batches minimizes latency.
Contribution
It demonstrates that both redundancy level and task assignment influence system latency, and identifies optimal uniform replication of disjoint batches for minimizing expected completion time.
Findings
Uniform replication of disjoint batches minimizes latency.
Task assignment strategies significantly affect system performance.
Redundancy and assignment choices are crucial for latency optimization.
Abstract
We study the expected completion time of some recently proposed algorithms for distributed computing which redundantly assign computing tasks to multiple machines in order to tolerate a certain number of machine failures. We analytically show that not only the amount of redundancy but also the task-to-machine assignments affect the latency in a distributed system. We study systems with a fixed number of computing tasks that are split in possibly overlapping batches, and independent exponentially distributed machine service times. We show that, for such systems, the uniform replication of non- overlapping (disjoint) batches of computing tasks achieves the minimum expected computing time.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
