Hadoop Performance Models

Herodotos Herodotou

arXiv:1106.0940·cs.DC·June 7, 2011·112 cites

Hadoop Performance Models

Herodotos Herodotou

PDF

Open Access

TL;DR

This paper presents detailed mathematical performance models for Hadoop MapReduce, enabling accurate performance estimation and optimal configuration tuning for large-scale data analytics tasks.

Contribution

It introduces comprehensive models that describe dataflow and costs at a fine granularity within MapReduce tasks, aiding performance prediction and optimization.

Findings

01

Models accurately estimate Hadoop job performance

02

Models help identify optimal configuration settings

03

Enhanced understanding of MapReduce execution phases

Abstract

Hadoop MapReduce is now a popular choice for performing large-scale data analytics. This technical report describes a detailed set of mathematical performance models for describing the execution of a MapReduce job on Hadoop. The models describe dataflow and cost information at the fine granularity of phases within the map and reduce tasks of a job execution. The models can be used to estimate the performance of MapReduce jobs as well as to find the optimal configuration settings to use when running the jobs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Big Data and Business Intelligence · Data Mining Algorithms and Applications