Characterization of Performance Anomalies in Hadoop

Puja Gupta

arXiv:1505.01919·cs.DC·May 14, 2015

Characterization of Performance Anomalies in Hadoop

Puja Gupta

PDF

Open Access

TL;DR

This paper presents a decision tree-based method to characterize performance anomalies in Hadoop by modeling execution time variations with different system settings, enabling anomaly detection and performance insights.

Contribution

The study introduces a decision tree approach to predict execution time ranges and detect anomalies in Hadoop workloads, providing a new tool for performance analysis.

Findings

01

99% of samples' execution times fell within predicted ranges

02

Impact of execution parameters relates to their position in the decision tree

03

Configuration insights can be derived from the trained decision tree

Abstract

With the huge variety of data and equally large-scale systems, there is not a unique execution setting for these systems which can guarantee the best performance for each query. In this project, we tried so study the impact of different execution settings on execution time of workloads by varying them one at a time. Using the data from these experiments, a decision tree was built where each internal node represents the execution parameter, each branch represents value chosen for the parameter and each leaf node represents a range for execution time in minutes. The attribute in the decision tree to split the dataset on is selected based on the maximum information gain or lowest entropy. Once the tree is trained with the training samples, this tree can be used to get approximate range for the expected execution time. When the actual execution time differs from this expected value, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Peer-to-Peer Network Technologies