Network Load Analysis and Provisioning of MapReduce Applications
Nikzad Babaii Rizvandi, Javid Taheri, Reza Moraveji, Albert Y. Zomaya

TL;DR
This paper analyzes how MapReduce configuration parameters affect network load during the shuffle phase and proposes an analytical model using regression to predict network load based on these parameters.
Contribution
It introduces a profiling and regression-based modeling approach to predict network load of MapReduce applications during shuffle phase.
Findings
Model accurately predicts network load for tested applications.
Network load depends significantly on number of mappers and reducers.
Method can assist in resource provisioning and optimization.
Abstract
In this paper, we study the dependency between configuration parameters and network load of fixed-size MapReduce applications in shuffle phase and then propose an analytical method to model this dependency. Our approach consists of three key phases: profiling, modeling, and prediction. In the first stage, an application is run several times with different sets of MapReduce configuration parameters (here number of mappers and number of reducers) to profile the network load of the application in the shuffle phase on a given cluster. Then, the relation between these parameters and the network load is modeled by multivariate linear regression. For evaluation, three applications (WordCount, Exim Mainlog parsing, and TeraSort) are utilized to evaluate our technique on a 4-node MapReduce private cluster.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Software System Performance and Reliability · Data Mining Algorithms and Applications
