Cluster Resource Management for Dynamic Workloads by Online Optimization
Nader Alfares, George Kesidis, Ata Fatahi Baarzi, Aman Jain

TL;DR
This paper explores the use of simulated annealing for online resource management in dynamic, containerized microservice workloads, balancing performance and cost effectively.
Contribution
It demonstrates the effectiveness of simulated annealing for online resource optimization across various workload scenarios and discusses its adaptability to other resource management problems.
Findings
Simulated annealing effectively balances performance and cost in resource management.
Case studies show improved service selection and container sizing.
The approach is adaptable to different workload types and management objectives.
Abstract
Over the past ten years, many different approaches have been proposed for different aspects of the problem of resources management for long running, dynamic and diverse workloads such as processing query streams or distributed deep learning. Particularly for applications consisting of containerized microservices, researchers have attempted to address problems of dynamic selection of, for example: types and quantities of virtualized services (e.g., IaaS/VMs), vertical and horizontal scaling of different microservices, assigning microservices to VMs, task scheduling, or some combination thereof. In this context, we argue that frameworks like simulated annealing are highly suitable for online navigation of trade-offs between performance (SLO) and cost, particularly when the complex workloads and cloud-service offerings vary over time. Based on a macroscopic objective that combines both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Software System Performance and Reliability
