Hadoop Scheduling Base On Data Locality

Bo Jiang; Jiaying Wu; Xiuyu Shi; Ruhuan Huang

arXiv:1506.00425·cs.DC·June 2, 2015·5 cites

Hadoop Scheduling Base On Data Locality

Bo Jiang, Jiaying Wu, Xiuyu Shi, Ruhuan Huang

PDF

Open Access

TL;DR

This paper proposes a data locality-aware job scheduling algorithm for Hadoop that uses resource prefetching to improve data locality and reduce job completion time.

Contribution

It introduces a novel scheduling algorithm incorporating resource prefetching based on estimated task completion times to enhance data locality in Hadoop.

Findings

01

Improved data locality in Hadoop scheduling.

02

Reduced job completion time with resource prefetching.

03

Effective preselection of non-local map tasks for prefetching.

Abstract

In hadoop, the job scheduling is an independent module, users can design their own job scheduler based on their actual application requirements, thereby meet their specific business needs. Currently, hadoop has three schedulers: FIFO, computing capacity scheduling and fair scheduling policy, all of them are take task allocation strategy that considerate data locality simply. They neither support data locality well nor fully apply to all cases of jobs scheduling. In this paper, we took the concept of resources-prefetch into consideration, and proposed a job scheduling algorithm based on data locality. By estimate the remaining time to complete a task, compared with the time to transfer a resources block, to preselect candidate nodes for task allocation. Then we preselect a non-local map tasks from the unfinished job queue as resources-prefetch tasks. Getting information of resources…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Distributed and Parallel Computing Systems · Scheduling and Optimization Algorithms