Towards Cloud Efficiency with Large-scale Workload Characterization
Anjaly Parayil, Jue Zhang, Xiaoting Qin, \'I\~nigo Goiri, Lexiang, Huang, Timothy Zhu, and Chetan Bansal

TL;DR
This paper presents a large-scale empirical analysis of Microsoft cloud workloads to understand their characteristics, aiming to improve cloud efficiency and reliability through better workload understanding.
Contribution
It provides the first comprehensive large-scale characterization of cloud workloads, revealing key features impacting efficiency and reliability.
Findings
Identified critical workload features affecting cloud performance
Analyzed variation of workload characteristics across different services
Provided insights for optimizing cloud resource management
Abstract
Cloud providers introduce features (e.g., Spot VMs, Harvest VMs, and Burstable VMs) and optimizations (e.g., oversubscription, auto-scaling, power harvesting, and overclocking) to improve efficiency and reliability. To effectively utilize these features, it's crucial to understand the characteristics of workloads running in the cloud. However, workload characteristics can be complex and depend on multiple signals, making manual characterization difficult and unscalable. In this study, we conduct the first large-scale examination of first-party workloads at Microsoft to understand their characteristics. Through an empirical study, we aim to answer the following questions: (1) What are the critical workload characteristics that impact efficiency and reliability on cloud platforms? (2) How do these characteristics vary across different workloads? (3) How can cloud platforms leverage these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Distributed and Parallel Computing Systems · IoT and Edge/Fog Computing
