Online Distributed Scheduling on a Fault-prone Parallel System
Elli Zavou, Antonio Fern\'andez Anta

TL;DR
This paper studies online distributed scheduling algorithms for a fault-prone parallel system, achieving optimal competitiveness for bounded task sizes and demonstrating improved performance with resource augmentation.
Contribution
It introduces optimal online algorithms for scheduling tasks with bounded sizes on faulty systems and analyzes the impact of resource speedup on competitiveness.
Findings
Optimal algorithms for two and k task sizes under adversarial conditions.
Competitiveness improves with increased resource speedup.
Bounded task sizes are essential for achieving competitiveness.
Abstract
We consider a parallel system of identical machines prone to unpredictable crashes and restarts, trying to cope with the continuous arrival of tasks to be executed. Tasks have different computational requirements (i.e., processing time or size). The flow of tasks, their size, and the crash and restart of the machines are assumed to be controlled by an adversary. Then, we focus on the study of online distributed algorithms for the efficient scheduling of the tasks. We use competitive analysis, considering as efficiency metric the completed-load, i.e., the aggregated size of the completed tasks. We first present optimal completed-load competitiveness algorithms when the number of different task sizes that can be injected by the adversary is bounded. (It is known that, if it is not bounded, competitiveness is not achievable.) We first consider only two different task sizes, and then…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimization and Search Problems · Distributed and Parallel Computing Systems · Scheduling and Optimization Algorithms
