Exploring Non-Homogeneity and Dynamicity of High Scale Cloud through   Hive and Pig

Kashish Ara Shakil; Mansaf Alam (Member; IAENG); Shuchi Sethi

arXiv:1503.06600·cs.DC·March 24, 2015

Exploring Non-Homogeneity and Dynamicity of High Scale Cloud through Hive and Pig

Kashish Ara Shakil, Mansaf Alam (Member, IAENG), Shuchi Sethi

PDF

Open Access

TL;DR

This paper analyzes large-scale cloud workload data using Hive and Pig, revealing insights into job clustering, arrival patterns, and resource usage distributions in a heterogeneous and dynamic cloud environment.

Contribution

It introduces a novel analytical method combining Hive and Pig for large-scale cloud workload analysis, providing new insights into workload distributions and clustering.

Findings

01

Job arrival times follow Weibull distribution

02

Resource usage distribution is Zipf-like

03

Process runtimes exhibit heavy-tailed distribution

Abstract

Cloud computing deals with heterogeneity and dynamicity at all levels and therefore there is a need to manage resources in such an environment and properly allocate them. Resource planning and scheduling requires a proper understanding of arrival patterns and scheduling of resources. Study of workloads can aid in proper understanding of their associated environment. Google has released its latest version of cluster trace, trace version 2.1 in November 2014.The trace consists of cell information of about 29 days spanning across 700k jobs. This paper deals with statistical analysis of this cluster trace. Since the size of trace is very large, Hive which is a Hadoop distributed file system (HDFS) based platform for querying and analysis of Big data, has been used. Hive was accessed through its Beeswax interface. The data was imported into HDFS through HCatalog. Apart from Hive, Pig which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Data Stream Mining Techniques · IoT and Edge/Fog Computing