MSARS: A Meta-Learning and Reinforcement Learning Framework for SLO   Resource Allocation and Adaptive Scaling for Microservices

Kan Hu; Linfeng Wen; Minxian Xu; Kejiang Ye

arXiv:2409.14953·cs.DC·September 24, 2024

MSARS: A Meta-Learning and Reinforcement Learning Framework for SLO Resource Allocation and Adaptive Scaling for Microservices

Kan Hu, Linfeng Wen, Minxian Xu, Kejiang Ye

PDF

Open Access

TL;DR

MSARS is a novel framework combining meta-learning and reinforcement learning to rapidly allocate resources and adaptively scale microservices, reducing SLO violations and resource costs in dynamic cloud environments.

Contribution

The paper introduces MSARS, a framework that integrates graph neural networks, meta-learning, and improved reinforcement learning for efficient SLO resource allocation and microservice auto-scaling.

Findings

01

MSARS reduces adaptation time by 40% compared to existing methods.

02

It achieves a 38% reduction in SLO violations.

03

Resource costs are decreased by 8% with MSARS.

Abstract

Service Level Objectives (SLOs) aim to set threshold for service time in cloud services to ensure acceptable quality of service (QoS) and user satisfaction. Currently, many studies consider SLOs as a system resource to be allocated, ensuring QoS meets the SLOs. Existing microservice auto-scaling frameworks that rely on SLO resources often utilize complex and computationally intensive models, requiring significant time and resources to determine appropriate resource allocation. This paper aims to rapidly allocate SLO resources and minimize resource costs while ensuring application QoS meets the SLO requirements in a dynamically changing microservice environment. We propose MSARS, a framework that leverages meta-learning to quickly derive SLO resource allocation strategies and employs reinforcement learning for adaptive scaling of microservice resources. It features three innovative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Software System Performance and Reliability · IoT and Edge/Fog Computing