Analytically-Driven Resource Management for Cloud-Native Microservices

Yanqi Zhang; Zhuangzhuang Zhou; Sameh Elnikety; Christina; Delimitrou

arXiv:2401.02920·cs.DC·January 8, 2024·1 cites

Analytically-Driven Resource Management for Cloud-Native Microservices

Yanqi Zhang, Zhuangzhuang Zhou, Sameh Elnikety, Christina, Delimitrou

PDF

Open Access

TL;DR

Ursa is a lightweight, analytical resource management system for cloud-native microservices that significantly reduces data collection and control overhead while maintaining SLA compliance and resource efficiency.

Contribution

Ursa introduces an analytical model-based approach that decomposes SLAs and explores microservices individually, outperforming ML-driven methods in speed and resource management.

Findings

01

Ursa shortens data collection by over 128 times.

02

Ursa's control plane is 43 times faster than ML-driven approaches.

03

Ursa reduces SLA violations and CPU usage significantly.

Abstract

Resource management for cloud-native microservices has attracted a lot of recent attention. Previous work has shown that machine learning (ML)-driven approaches outperform traditional techniques, such as autoscaling, in terms of both SLA maintenance and resource efficiency. However, ML-driven approaches also face challenges including lengthy data collection processes and limited scalability. We present Ursa, a lightweight resource management system for cloud-native microservices that addresses these challenges. Ursa uses an analytical model that decomposes the end-to-end SLA into per-service SLA, and maps per-service SLA to individual resource allocations per microservice tier. To speed up the exploration process and avoid prolonged SLA violations, Ursa explores each microservice individually, and swiftly stops exploration if latency exceeds its SLA. We evaluate Ursa on a set of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Software System Performance and Reliability

Methodstravel james · Sparse Evolutionary Training · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings