Morpheus: Lightweight RTT Prediction for Performance-Aware Load Balancing
Panagiotis Giannakopoulos, Bart van Knippenberg, Kishor Chandra Joshi, Nicola Calabretta, George Exarchakos

TL;DR
This paper presents lightweight RTT predictors trained on Kubernetes GPU cluster data to improve load balancing, significantly reducing latency and resource waste in distributed applications.
Contribution
Develops accurate, low-overhead RTT predictors using minimal metrics, enabling effective performance-aware load balancing in resource-constrained environments.
Findings
RTT predictors achieve up to 95% accuracy
Prediction delay remains within 10% of application RTT
Performance-aware load balancing reduces latency and resource waste
Abstract
Distributed applications increasingly demand low end-to-end latency, especially in edge and cloud environments where co-located workloads contend for limited resources. Traditional load-balancing strategies are typically reactive and rely on outdated or coarse-grained metrics, often leading to suboptimal routing decisions and increased tail latencies. This paper investigates the use of round-trip time (RTT) predictors to enhance request routing by anticipating application latency. We develop lightweight and accurate RTT predictors that are trained on time-series monitoring data collected from a Kubernetes-managed GPU cluster. By leveraging a reduced set of highly correlated monitoring metrics, our approach maintains low overhead while remaining adaptable to diverse co-location scenarios and heterogeneous hardware. The predictors achieve up to 95% accuracy while keeping the prediction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
