On Modelling and Prediction of Total CPU Usage for Applications in MapReduce Environments
Nikzad Babaii Rizvandi, Javid Taheri, Reza Moraveji, Albert Y. Zomaya

TL;DR
This paper presents a polynomial regression-based model to predict total CPU usage of MapReduce jobs, aiding resource provisioning and configuration parameter selection in cloud environments.
Contribution
It introduces a novel approach to model and predict total CPU usage based on configuration parameters and input data scaling in MapReduce environments.
Findings
Prediction accuracy within 8% of actual CPU usage
Model validated on three real-world applications
Input data scaling influences total CPU usage
Abstract
Recently, businesses have started using MapReduce as a popular computation framework for processing large amount of data, such as spam detection, and different data mining tasks, in both public and private clouds. Two of the challenging questions in such environments are (1) choosing suitable values for MapReduce configuration parameters -e.g., number of mappers, number of reducers, and DFS block size-, and (2) predicting the amount of resources that a user should lease from the service provider. Currently, the tasks of both choosing configuration parameters and estimating required resources are solely the users' responsibilities. In this paper, we present an approach to provision the total CPU usage in clock cycles of jobs in MapReduce environment. For a MapReduce job, a profile of total CPU usage in clock cycles is built from the job past executions with different values of two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Data Mining Algorithms and Applications · Big Data and Business Intelligence
