Statistical Regression to Predict Total Cumulative CPU Usage of   MapReduce Jobs

Nikzad Babaii Rizvandi; Javid Taheri; Reza Moraveji; Albert Y. Zomaya

arXiv:1303.3632·cs.DC·March 18, 2013

Statistical Regression to Predict Total Cumulative CPU Usage of MapReduce Jobs

Nikzad Babaii Rizvandi, Javid Taheri, Reza Moraveji, Albert Y. Zomaya

PDF

Open Access

TL;DR

This paper introduces a polynomial regression model to predict total CPU usage of MapReduce jobs based on configuration parameters, aiding resource provisioning and scaling in cloud environments.

Contribution

It presents a novel approach using regression analysis to accurately estimate CPU usage from configuration settings and input data size in MapReduce jobs.

Findings

01

Prediction accuracy within 8% of actual CPU usage

02

Model validated on three real-world applications

03

Influence of input data scaling on CPU usage analyzed

Abstract

Recently, businesses have started using MapReduce as a popular computation framework for processing large amount of data, such as spam detection, and different data mining tasks, in both public and private clouds. Two of the challenging questions in such environments are (1) choosing suitable values for MapReduce configuration parameters e.g., number of mappers, number of reducers, and DFS block size, and (2) predicting the amount of resources that a user should lease from the service provider. Currently, the tasks of both choosing configuration parameters and estimating required resources are solely the users responsibilities. In this paper, we present an approach to provision the total CPU usage in clock cycles of jobs in MapReduce environment. For a MapReduce job, a profile of total CPU usage in clock cycles is built from the job past executions with different values of two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Big Data and Business Intelligence · Data Stream Mining Techniques