ReLeaSER: A Reinforcement Learning Strategy for Optimizing Utilization Of Ephemeral Cloud Resources
Mohamed Handaoui, Jean-Emile Dartois, Jalil Boukhobza and, Olivier Barais, Laurent d'Orazio

TL;DR
ReLeaSER employs reinforcement learning to dynamically optimize safety margins for ephemeral cloud resources, significantly reducing SLA violations and increasing potential cost savings by adapting to workload variations.
Contribution
The paper introduces ReLeaSER, a novel RL-based approach that dynamically adjusts safety margins at host-level for better resource utilization and SLA compliance.
Findings
Reduces SLA violation penalties by up to 3.4x.
Improves potential savings by up to 43.6%.
Learns from past prediction errors to optimize resource management.
Abstract
Cloud data center capacities are over-provisioned to handle demand peaks and hardware failures which leads to low resources' utilization. One way to improve resource utilization and thus reduce the total cost of ownership is to offer unused resources (referred to as ephemeral resources) at a lower price. However, reselling resources needs to meet the expectations of its customers in terms of Quality of Service. The goal is so to maximize the amount of reclaimed resources while avoiding SLA penalties. To achieve that, cloud providers have to estimate their future utilization to provide availability guarantees. The prediction should consider a safety margin for resources to react to unpredictable workloads. The challenge is to find the safety margin that provides the best trade-off between the amount of resources to reclaim and the risk of SLA violations. Most state-of-the-art solutions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
