Delay-optimal policies in partial fork-join systems with redundancy and   random slowdowns

Martin Zubeldia

arXiv:1910.09602·cs.PF·October 23, 2019

Delay-optimal policies in partial fork-join systems with redundancy and random slowdowns

Martin Zubeldia

PDF

Open Access

TL;DR

This paper analyzes delay-optimal policies in large distributed systems with redundancy and random slowdowns, showing how to minimize job delay asymptotically by adjusting the number of replicas based on server slowdown characteristics and task sizes.

Contribution

It introduces models for server slowdowns and derives asymptotically delay-minimizing policies that depend on slowdown distributions and task sizes.

Findings

01

Optimal number of replicas depends on arrival rate and expected slowdown.

02

Delay is minimized when replicas are allocated based on task size and slowdown variability.

03

Asymptotic delay bounds are established for different slowdown models.

Abstract

We consider a large distributed service system consisting of $n$ homogeneous servers with infinite capacity FIFO queues. Jobs arrive as a Poisson process of rate $λn / k_{n}$ (for some positive constant $λ$ and integer $k_{n}$ ). Each incoming job consists of $k_{n}$ identical tasks that can be executed in parallel, and that can be encoded into at least $k_{n}$ "replicas" of the same size (by introducing redundancy) so that the job is considered to be completed when any $k_{n}$ replicas associated with it finish their service. Moreover, we assume that servers can experience random slowdowns in their processing rate so that the service time of a replica is the product of its size and a random slowdown. First, we assume that the server slowdowns are shifted exponential and independent of the replica sizes. In this setting we show that the delay of a typical job is asymptotically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Queuing Theory Analysis · Age of Information Optimization · Distributed systems and fault tolerance