ReLeaSER: A Reinforcement Learning Strategy for Optimizing Utilization   Of Ephemeral Cloud Resources

Mohamed Handaoui; Jean-Emile Dartois; Jalil Boukhobza and; Olivier Barais; Laurent d'Orazio

arXiv:2009.11208·cs.PF·December 11, 2020

ReLeaSER: A Reinforcement Learning Strategy for Optimizing Utilization Of Ephemeral Cloud Resources

Mohamed Handaoui, Jean-Emile Dartois, Jalil Boukhobza and, Olivier Barais, Laurent d'Orazio

PDF

TL;DR

ReLeaSER employs reinforcement learning to dynamically optimize safety margins for ephemeral cloud resources, significantly reducing SLA violations and increasing potential cost savings by adapting to workload variations.

Contribution

The paper introduces ReLeaSER, a novel RL-based approach that dynamically adjusts safety margins at host-level for better resource utilization and SLA compliance.

Findings

01

Reduces SLA violation penalties by up to 3.4x.

02

Improves potential savings by up to 43.6%.

03

Learns from past prediction errors to optimize resource management.

Abstract

Cloud data center capacities are over-provisioned to handle demand peaks and hardware failures which leads to low resources' utilization. One way to improve resource utilization and thus reduce the total cost of ownership is to offer unused resources (referred to as ephemeral resources) at a lower price. However, reselling resources needs to meet the expectations of its customers in terms of Quality of Service. The goal is so to maximize the amount of reclaimed resources while avoiding SLA penalties. To achieve that, cloud providers have to estimate their future utilization to provide availability guarantees. The prediction should consider a safety margin for resources to react to unpredictable workloads. The challenge is to find the safety margin that provides the best trade-off between the amount of resources to reclaim and the risk of SLA violations. Most state-of-the-art solutions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.