Predictive Autoscaling for Node.js on Kubernetes: Lower Latency, Right-Sized Capacity
Ivan Tymoshenko, Luca Maraschi, Matteo Collina

TL;DR
This paper introduces a predictive autoscaling algorithm for Node.js workloads on Kubernetes that anticipates load changes to maintain low latency and avoid over-provisioning, outperforming existing reactive methods.
Contribution
The authors develop a novel predictive scaling approach using aggregate metrics and a metric model, enabling proactive capacity adjustments for Node.js on Kubernetes.
Findings
Median latency of 26ms with the new algorithm versus 154ms (KEDA) and 522ms (HPA)
The approach maintains per-instance load near target thresholds during load changes
Cluster-wide aggregate metrics enable stable, accurate load forecasting
Abstract
Kubernetes offers two default paths for scaling Node\.js workloads, and both have structural limitations. The Horizontal Pod Autoscaler scales on CPU utilization, which does not directly measure event loop saturation: a Node.js pod can queue requests and miss latency SLOs while CPU reports moderate usage. KEDA extends HPA with richer triggers, including event-loop metrics, but inherits the same reactive control loop, detecting overload only after it has begun. By the time new pods start and absorb traffic, the system may already be degraded. Lowering thresholds shifts the operating point but does not change the dynamic: the scaler still reacts to a value it has already crossed, at the cost of permanent over-provisioning. We propose a predictive scaling algorithm that forecasts where load will be by the time new capacity is ready and scales proactively based on that forecast.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
