Benchmarking and Performance Modelling of MapReduce Communication Pattern
Sheriffo Ceesay, Adam Barker, Yuhui Lin

TL;DR
This paper develops phase-level performance models for MapReduce communication patterns to predict application execution times with around 10% accuracy, aiding performance optimization.
Contribution
It introduces minimal-parameter, phase-level models focusing on MapReduce internals, enabling performance prediction without extensive configuration tuning.
Findings
Models achieve ±10% error rate in predictions
Validated on two different experimental setups
Applicable for unseen applications and varying datasets
Abstract
Understanding and predicting the performance of big data applications running in the cloud or on-premises could help minimise the overall cost of operations and provide opportunities in efforts to identify performance bottlenecks. The complexity of the low-level internals of big data frameworks and the ubiquity of application and workload configuration parameters makes it challenging and expensive to come up with comprehensive performance modelling solutions. In this paper, instead of focusing on a wide range of configurable parameters, we studied the low-level internals of the MapReduce communication pattern and used a minimal set of performance drivers to develop a set of phase level parametric models for approximating the execution time of a given application on a given cluster. Models can be used to infer the performance of unseen applications and approximate their performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
