Scalable Infrastructure for Workload Characterization of Cluster Traces
Thomas van Loo, Anshul Jindal, Shajulin Benedict, Mohak Chadha,, Michael Gerndt

TL;DR
This paper introduces a scalable infrastructure leveraging Google's Dataproc to analyze and characterize heterogeneous cloud workload traces, aiding cloud providers and users in resource management.
Contribution
It presents a novel scalable infrastructure for workload trace analysis in cloud environments, specifically tailored for large production clusters like Google Cloud.
Findings
Workload heterogeneity varies significantly across jobs.
Resource consumption patterns differ based on workload types.
The infrastructure effectively analyzes large-scale cloud workload traces.
Abstract
In the recent past, characterizing workloads has been attempted to gain a foothold in the emerging serverless cloud market, especially in the large production cloud clusters of Google, AWS, and so forth. While analyzing and characterizing real workloads from a large production cloud cluster benefits cloud providers, researchers, and daily users, analyzing the workload traces of these clusters has been an arduous task due to the heterogeneous nature of data. This article proposes a scalable infrastructure based on Google's dataproc for analyzing the workload traces of cloud environments. We evaluated the functioning of the proposed infrastructure using the workload traces of Google cloud cluster-usage-traces-v3. We perform the workload characterization on this dataset, focusing on the heterogeneity of the workload, the variations in job durations, aspects of resources consumption, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
