Hadoop Performance Models
Herodotos Herodotou

TL;DR
This paper presents detailed mathematical performance models for Hadoop MapReduce, enabling accurate performance estimation and optimal configuration tuning for large-scale data analytics tasks.
Contribution
It introduces comprehensive models that describe dataflow and costs at a fine granularity within MapReduce tasks, aiding performance prediction and optimization.
Findings
Models accurately estimate Hadoop job performance
Models help identify optimal configuration settings
Enhanced understanding of MapReduce execution phases
Abstract
Hadoop MapReduce is now a popular choice for performing large-scale data analytics. This technical report describes a detailed set of mathematical performance models for describing the execution of a MapReduce job on Hadoop. The models describe dataflow and cost information at the fine granularity of phases within the map and reduce tasks of a job execution. The models can be used to estimate the performance of MapReduce jobs as well as to find the optimal configuration settings to use when running the jobs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Big Data and Business Intelligence · Data Mining Algorithms and Applications
