Mitigating Temporal Blindness in Kubernetes Autoscaling: An Attention-Double-LSTM Framework
Faraz Shaikh, Gianluca Reali, Mauro Femminella

TL;DR
This paper introduces an attention-enhanced Double LSTM framework within a reinforcement learning agent to improve Kubernetes autoscaling, effectively capturing long-term dependencies and reducing latency and resource flapping in edge computing.
Contribution
It presents a novel stability-aware autoscaling method combining workload forecasting with deep attention-based memory, outperforming existing reactive and RL-based controllers.
Findings
Reduces 90th percentile latency by approximately 29%.
Decreases replica churn by 39%.
Outperforms industry-standard HPA and other RL baselines.
Abstract
In the emerging landscape of edge computing, the stochastic and bursty nature of serverless workloads presents a critical challenge for autonomous resource orchestration. Traditional reactive controllers, such as the Kubernetes Horizontal Pod Autoscaler (HPA), suffer from inherent reaction latency, leading to Service Level Objective (SLO) violations during traffic spikes and resource flapping during ramp-downs. While Deep Reinforcement Learning (DRL) offers a pathway toward proactive management, standard agents suffer from temporal blindness, an inability to effectively capture long-term dependencies in non-Markovian edge environments. To bridge this gap, we propose a novel stability-aware autoscaling framework unifying workload forecasting and control via an Attention-Enhanced Double-Stacked LSTM architecture integrated within a Proximal Policy Optimization (PPO) agent. Unlike shallow…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
