AI-based Resource Allocation: Reinforcement Learning for Adaptive Auto-scaling in Serverless Environments
Lucia Schuler, Somaya Jamil, Niklas K\"uhl

TL;DR
This paper explores using reinforcement learning to optimize auto-scaling in serverless computing, demonstrating improved performance over default configurations by learning effective policies for workload-specific resource management.
Contribution
It introduces a reinforcement learning approach for request-based auto-scaling in serverless environments, addressing the challenge of optimal concurrency configuration.
Findings
Reinforcement learning effectively learns scaling policies within limited iterations.
The approach improves performance compared to default auto-scaling settings.
Workload-specific policies enhance service quality in serverless platforms.
Abstract
Serverless computing has emerged as a compelling new paradigm of cloud computing models in recent years. It promises the user services at large scale and low cost while eliminating the need for infrastructure management. On cloud provider side, flexible resource management is required to meet fluctuating demand. It can be enabled through automated provisioning and deprovisioning of resources. A common approach among both commercial and open source serverless computing platforms is workload-based auto-scaling, where a designated algorithm scales instances according to the number of incoming requests. In the recently evolving serverless framework Knative a request-based policy is proposed, where the algorithm scales resources by a configured maximum number of requests that can be processed in parallel per instance, the so-called concurrency. As we show in a baseline experiment, this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
