Analysis and Clustering of Workload in Google Cluster Trace based on Resource Usage
Mansaf Alam, Kashish Ara Shakil, Shuchi Sethi

TL;DR
This paper analyzes the Google Cluster Trace to understand resource usage, identifies workload patterns through clustering, and classifies jobs, providing insights into production cloud environments and revealing new findings about job characteristics.
Contribution
It offers a detailed statistical profile of jobs, applies k-means clustering for workload pattern analysis, and classifies jobs into types, highlighting novel insights like trimodal job distributions.
Findings
Jobs in the trace are trimodal.
Symmetry exists in tasks within long jobs.
Clustering reveals distinct workload patterns.
Abstract
Cloud computing has gained interest amongst commercial organizations, research communities, developers and other individuals during the past few years.In order to move ahead with research in field of data management and processing of such data, we need benchmark datasets and freely available data which are publicly accessible. Google in May 2011 released a trace of a cluster of 11k machines referred as Google Cluster Trace.This trace contains cell information of about 29 days.This paper provides analysis of resource usage and requirements in this trace and is an attempt to give an insight into such kind of production trace similar to the ones in cloud environment.The major contributions of this paper include Statistical Profile of Jobs based on resource usage, clustering of Workload Patterns and Classification of jobs into different types based on k-means clustering.Though there have…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
