Delay Asymptotics and Bounds for Multi-Task Parallel Jobs
Weina Wang, Mor Harchol-Balter, Haotian Jiang, Alan Scheller-Wolf, R., Srikant

TL;DR
This paper analyzes the delay of multi-task jobs in parallel server systems, establishing asymptotic independence of task delays as the number of servers grows, and providing bounds on job delay using a novel Poisson oversampling technique.
Contribution
It introduces asymptotic independence results for increasing number of tasks and servers, and develops a new Poisson oversampling method to bound job delay.
Findings
Job delay converges to the maximum of independent task delays under certain conditions.
Asymptotic independence holds when the number of tasks grows slower than the fourth root of servers.
Poisson oversampling converts delay analysis into a positive correlation balls-and-bins problem.
Abstract
We study delay of jobs that consist of multiple parallel tasks, which is a critical performance metric in a wide range of applications such as data file retrieval in coded storage systems and parallel computing. In this problem, each job is completed only when all of its tasks are completed, so the delay of a job is the maximum of the delays of its tasks. Despite the wide attention this problem has received, tight analysis is still largely unknown since analyzing job delay requires characterizing the complicated correlation among task delays, which is hard to do. We first consider an asymptotic regime where the number of servers, , goes to infinity, and the number of tasks in a job, , is allowed to increase with . We establish the asymptotic independence of any queues under the condition . This greatly generalizes the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Queuing Theory Analysis · Probability and Risk Models · Distributed systems and fault tolerance
