Interference and Need Aware Workload Colocation in Hyperscale   Datacenters

Sayak Chakraborti; Brian Coutinho; Sandhya Dwarkadas; Parth; Malani; Bikash Sharma

arXiv:2207.12499·cs.DC·July 27, 2022

Interference and Need Aware Workload Colocation in Hyperscale Datacenters

Sayak Chakraborti, Brian Coutinho, Sandhya Dwarkadas, Parth, Malani, Bikash Sharma

PDF

Open Access

TL;DR

This paper proposes a tunable resource allocation approach for hyperscale datacenters that balances operational efficiency and SLO guarantees by accounting for workload interference and platform heterogeneity.

Contribution

It introduces a novel, tunable resource management method combining online service classification and offline sensitivity analysis for improved workload colocation.

Findings

01

Up to 50% reduction in required machines when tuning for efficiency.

02

Up to 40% reduction in TCO and 60% reduction in resource fragmentation.

03

SLO violations can be reduced by 22% with interference-aware colocation.

Abstract

Datacenters suffer from resource utilization inefficiencies due to the conflicting goals of service owners and platform providers. Service owners intending to maintain Service Level Objectives (SLO) for themselves typically request a conservative amount of resources. Platform providers want to increase operational efficiency to reduce capital and operating costs. Achieving both operational efficiency and SLO for individual services at the same time is challenging due to the diversity in service workload characteristics, resource usage patterns that are dependent on input load, heterogeneity in platform, memory, I/O, and network architecture, and resource bundling. This paper presents a tunable approach to resource allocation that accounts for both dynamic service resource needs and platform heterogeneity. In addition, an online K-Means-based service classification method is used in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Distributed and Parallel Computing Systems · IoT and Edge/Fog Computing