Scheduling Dags under Uncertainty
Grzegorz Malewicz

TL;DR
This paper studies a parallel scheduling problem for DAGs on unreliable workers, providing polynomial algorithms under certain restrictions and proving NP-hardness and inapproximability results in more general cases.
Contribution
It introduces a new scheduling model with probabilistic worker reliability, offers a polynomial solution for restricted cases, and establishes complexity bounds for general scenarios.
Findings
Polynomial algorithm for fixed dag width and worker count
NP-hardness when either dag width or worker count grows
Inapproximability within 5/4 factor for general case
Abstract
This paper introduces a parallel scheduling problem where a directed acyclic graph modeling tasks and their dependencies needs to be executed on unreliable workers. Worker executes task correctly with probability . The goal is to find a regimen , that dictates how workers get assigned to tasks (possibly in parallel and redundantly) throughout execution, so as to minimize the expected completion time. This fundamental parallel scheduling problem arises in grid computing and project management fields, and has several applications. We show a polynomial time algorithm for the problem restricted to the case when dag width is at most a constant and the number of workers is also at most a constant. These two restrictions may appear to be too severe. However, they are fundamentally required. Specifically, we demonstrate that the problem is NP-hard with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Scheduling and Optimization Algorithms · Interconnection Networks and Systems
