Hybrid Job-driven Scheduling for Virtual MapReduce Clusters
Ming-Chang Lee, Jia-Chun Lin, Ramin Yahyapour

TL;DR
This paper introduces a hybrid job-driven scheduling scheme (JoSS) for virtual MapReduce clusters, improving data locality and job performance by classifying jobs and applying tailored scheduling policies.
Contribution
It proposes a novel hybrid scheduling scheme (JoSS) that classifies jobs and optimizes scheduling at multiple levels for virtual MapReduce environments, with two variations tailored for different workloads.
Findings
Outperforms existing algorithms in data locality and network overhead.
Two variations are effective for different workload scenarios.
Achieves better job performance without significant overhead.
Abstract
It is cost-efficient for a tenant with a limited budget to establish a virtual MapReduce cluster by renting multiple virtual private servers (VPSs) from a VPS provider. To provide an appropriate scheduling scheme for this type of computing environment, we propose in this paper a hybrid job-driven scheduling scheme (JoSS for short) from a tenant's perspective. JoSS provides not only job level scheduling, but also map-task level scheduling and reduce-task level scheduling. JoSS classifies MapReduce jobs based on job scale and job type and designs an appropriate scheduling policy to schedule each class of jobs. The goal is to improve data locality for both map tasks and reduce tasks, avoid job starvation, and improve job execution performance. Two variations of JoSS are further introduced to separately achieve a better map-data locality and a faster task assignment. We conduct extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Distributed and Parallel Computing Systems
