Benchmarking and Performance Modelling of MapReduce Communication   Pattern

Sheriffo Ceesay; Adam Barker; Yuhui Lin

arXiv:2005.11608·cs.DC·May 26, 2020

Benchmarking and Performance Modelling of MapReduce Communication Pattern

Sheriffo Ceesay, Adam Barker, Yuhui Lin

PDF

TL;DR

This paper develops phase-level performance models for MapReduce communication patterns to predict application execution times with around 10% accuracy, aiding performance optimization.

Contribution

It introduces minimal-parameter, phase-level models focusing on MapReduce internals, enabling performance prediction without extensive configuration tuning.

Findings

01

Models achieve ±10% error rate in predictions

02

Validated on two different experimental setups

03

Applicable for unseen applications and varying datasets

Abstract

Understanding and predicting the performance of big data applications running in the cloud or on-premises could help minimise the overall cost of operations and provide opportunities in efforts to identify performance bottlenecks. The complexity of the low-level internals of big data frameworks and the ubiquity of application and workload configuration parameters makes it challenging and expensive to come up with comprehensive performance modelling solutions. In this paper, instead of focusing on a wide range of configurable parameters, we studied the low-level internals of the MapReduce communication pattern and used a minimal set of performance drivers to develop a set of phase level parametric models for approximating the execution time of a given application on a given cluster. Models can be used to infer the performance of unseen applications and approximate their performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.