Data-Locality-Aware Task Assignment and Scheduling for Distributed Job Executions
Hailiang Zhao, Xueyan Tang, Peng Chen, Jianwei Yin, Shuiguang Deng

TL;DR
This paper introduces new algorithms for data-locality-aware task assignment and scheduling in distributed systems, aiming to minimize job completion times efficiently without prior knowledge of job arrivals.
Contribution
It presents the Optimal Balanced Task Assignment (OBTA), extends the Water-Filling algorithm with proven approximation bounds, and introduces the Replica-Deletion heuristic for improved scheduling.
Findings
OBTA achieves minimal job completion times with reduced computational overhead.
The extended Water-Filling algorithm has an approximation factor equal to the number of task groups.
The Replica-Deletion heuristic outperforms Water-Filling in various workloads.
Abstract
This paper addresses the data-locality-aware task assignment and scheduling problem for distributed job executions. Our goal is to minimize job completion times without prior knowledge of future job arrivals. We propose an Optimal Balanced Task Assignment algorithm (OBTA), which achieves minimal job completion times while significantly reducing computational overhead through efficient narrowing of the solution search space. To balance performance and efficiency, we extend the approximate Water-Filling (WF) algorithm, providing a rigorous proof that its approximation factor equals the number of task groups in a job. We also introduce a novel heuristic, Replica-Deletion (RD), which outperforms WF by leveraging global optimization techniques. To further enhance scheduling efficiency, we incorporate job ordering strategies based on a shortest-estimated-time-first policy, reducing average…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Cloud Computing and Resource Management · Distributed systems and fault tolerance
