Fault-tolerant parallel scheduling of arbitrary length jobs on a shared channel
Marek Klonowski, Dariusz R. Kowalski, Jaros{\l}aw Mirek, Prudence, W.H. Wong

TL;DR
This paper investigates fault-tolerant parallel job scheduling on shared channels, analyzing how preemption and adversarial failures affect performance, and establishing bounds and algorithms for different failure and preemption scenarios.
Contribution
It introduces a comprehensive analysis of scheduling arbitrary length jobs with machine failures and preemption, identifying features that influence problem complexity and providing bounds and algorithms.
Findings
Preemption significantly impacts scheduling difficulty.
Randomization benefits are limited to non-adaptive adversaries.
The problem's complexity varies with adversary severity and preemption ability.
Abstract
We study the problem of scheduling jobs on fault-prone machines communicating via a shared channel, also known as multiple-access channel. We have arbitrary length jobs to be scheduled on identical machines, of which are prone to crashes by an adversary. A machine can inform other machines when a job is completed via the channel without collision detection. Performance is measured by the total number of available machine steps during the whole execution. Our goal is to study the impact of preemption (i.e., interrupting the execution of a job and resuming later in the same or different machine) and failures on the work performance of job processing. The novelty is the ability to identify the features that determine the complexity (difficulty) of the problem. We show that the problem becomes difficult when preemption is not allowed, by showing corresponding lower and upper…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
