Scheduling Data Intensive Workloads through Virtualization on MapReduce based Clouds
B. Thirumala Rao, L. S. S. Reddy

TL;DR
This paper introduces a novel dynamic scheduler for MapReduce workloads in cloud environments that optimizes resource allocation and data locality to meet job deadlines more efficiently.
Contribution
It proposes a resource-aware scheduling approach with dynamic resource reconfiguration to improve deadline adherence and system throughput in MapReduce clouds.
Findings
Achieved approximately 12% increase in job throughput.
Effectively balances data locality and resource utilization.
Outperforms the Fair Scheduler in experiments.
Abstract
MapReduce has become a popular programming model for running data intensive applications on the cloud. Completion time goals or deadlines of MapReduce jobs set by users are becoming crucial in existing cloud-based data processing environments like Hadoop. There is a conflict between the scheduling MR jobs to meet deadlines and "data locality" (assigning tasks to nodes that contain their input data). To meet the deadline a task may be scheduled on a node without local input data for that task causing expensive data transfer from a remote node. In this paper, a novel scheduler is proposed to address the above problem which is primarily based on the dynamic resource reconfiguration approach. It has two components: 1) Resource Predictor: which dynamically determines the required number of Map/Reduce slots for every job to meet completion time guarantee; 2) Resource Reconfigurator: that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
