Task-Cloning Algorithms in a MapReduce Cluster with Competitive Performance Bounds
Huanle Xu, Wing Cheong Lau

TL;DR
This paper introduces task cloning algorithms for MapReduce job scheduling that achieve competitive performance bounds and significantly reduce job flowtimes, especially for small jobs, by addressing stragglers and resource bottlenecks.
Contribution
The paper proposes novel online and offline scheduling algorithms based on SRPT with task cloning, providing competitive bounds and practical improvements over existing schemes.
Findings
SRPTMS+C reduces job flowtimes by up to 25% in simulations.
The offline algorithm is 2-competitive under low variance conditions.
SRPTMS+C is $(1+\epsilon)$-speed $o(1/\epsilon^2)$-competitive in minimizing weighted flowtimes.
Abstract
Job scheduling for a MapReduce cluster has been an active research topic in recent years. However, measurement traces from real-world production environment show that the duration of tasks within a job vary widely. The overall elapsed time of a job, i.e. the so-called flowtime, is often dictated by one or few slowly-running tasks within a job, generally referred as the "stragglers". The cause of stragglers include tasks running on partially/intermittently failing machines or the existence of some localized resource bottleneck(s) within a MapReduce cluster. To tackle this online job scheduling challenge, we adopt the task cloning approach and design the corresponding scheduling algorithms which aim at minimizing the weighted sum of job flowtimes in a MapReduce cluster based on the Shortest Remaining Processing Time scheduler (SRPT). To be more specific, we first design a 2-competitive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Distributed and Parallel Computing Systems
