An SLO Driven and Cost-Aware Autoscaling Framework for Kubernetes

Vinoth Punniyamoorthy; Bikesh Kumar; Sumit Saha; Lokesh Butra; Mayilsamy Palanigounder; Akash Kumar Agarwal; Kabilan Kannan

arXiv:2512.23415·cs.SE·December 30, 2025

An SLO Driven and Cost-Aware Autoscaling Framework for Kubernetes

Vinoth Punniyamoorthy, Bikesh Kumar, Sumit Saha, Lokesh Butra, Mayilsamy Palanigounder, Akash Kumar Agarwal, Kabilan Kannan

PDF

Open Access

TL;DR

This paper proposes an AIOps-driven autoscaling framework for Kubernetes that balances SLO adherence and cost efficiency, outperforming default methods in reducing violations, response time, and costs.

Contribution

It introduces a safe, explainable multi-signal autoscaling approach that integrates SLO-awareness and demand forecasting for Kubernetes.

Findings

01

Reduces SLO violation duration by up to 31%

02

Improves scaling response time by 24%

03

Lowers infrastructure cost by 18%

Abstract

Kubernetes provides native autoscaling mechanisms, including the Horizontal Pod Autoscaler, Vertical Pod Autoscaler, and node-level autoscalers, to enable elastic resource management for cloud-native applications. However, production environments frequently experience Service Level Objective violations and cost inefficiencies due to reactive scaling behavior, limited use of application-level signals, and opaque control logic. This paper investigates how Kubernetes autoscaling can be enhanced using AIOps principles to jointly satisfy SLO and cost constraints under diverse workload patterns without compromising safety or operational transparency. We present a gap-driven analysis of existing autoscaling approaches and propose a safe and explainable multi-signal autoscaling framework that integrates SLO-aware and cost-conscious control with lightweight demand forecasting. Experimental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · Software-Defined Networks and 5G