Interference and Need Aware Workload Colocation in Hyperscale Datacenters
Sayak Chakraborti, Brian Coutinho, Sandhya Dwarkadas, Parth, Malani, Bikash Sharma

TL;DR
This paper proposes a tunable resource allocation approach for hyperscale datacenters that balances operational efficiency and SLO guarantees by accounting for workload interference and platform heterogeneity.
Contribution
It introduces a novel, tunable resource management method combining online service classification and offline sensitivity analysis for improved workload colocation.
Findings
Up to 50% reduction in required machines when tuning for efficiency.
Up to 40% reduction in TCO and 60% reduction in resource fragmentation.
SLO violations can be reduced by 22% with interference-aware colocation.
Abstract
Datacenters suffer from resource utilization inefficiencies due to the conflicting goals of service owners and platform providers. Service owners intending to maintain Service Level Objectives (SLO) for themselves typically request a conservative amount of resources. Platform providers want to increase operational efficiency to reduce capital and operating costs. Achieving both operational efficiency and SLO for individual services at the same time is challenging due to the diversity in service workload characteristics, resource usage patterns that are dependent on input load, heterogeneity in platform, memory, I/O, and network architecture, and resource bundling. This paper presents a tunable approach to resource allocation that accounts for both dynamic service resource needs and platform heterogeneity. In addition, an online K-Means-based service classification method is used in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Distributed and Parallel Computing Systems · IoT and Edge/Fog Computing
