LA-IMR: Latency-Aware, Predictive In-Memory Routing and Proactive Autoscaling for Tail-Latency-Sensitive Cloud Robotics

Eunil Seo; Chanh Nguyen; Erik Elmroth

arXiv:2505.07417·cs.DC·May 13, 2025

LA-IMR: Latency-Aware, Predictive In-Memory Routing and Proactive Autoscaling for Tail-Latency-Sensitive Cloud Robotics

Eunil Seo, Chanh Nguyen, Erik Elmroth

PDF

Open Access

TL;DR

LA-IMR is a control layer that combines latency modeling, predictive routing, and proactive autoscaling to significantly reduce tail latency in cloud-edge inference workloads.

Contribution

It introduces a novel latency-aware control framework that integrates a closed-form latency model with event-driven scheduling and autoscaling for tail-latency-sensitive cloud robotics.

Findings

01

Reduces P99 latency by up to 20.7% in vision workloads.

02

Effectively mitigates long-tail latency spikes during bursty request streams.

03

Provides a principled foundation for tail-tolerant cloud-edge inference services.

Abstract

Hybrid cloud-edge infrastructures now support latency-critical workloads ranging from autonomous vehicles and surgical robotics to immersive AR/VR. However, they continue to experience crippling long-tail latency spikes whenever bursty request streams exceed the capacity of heterogeneous edge and cloud tiers. To address these long-tail latency issues, we present Latency-Aware, Predictive In-Memory Routing and Proactive Autoscaling (LA-IMR). This control layer integrates a closed-form, utilization-driven latency model with event-driven scheduling, replica autoscaling, and edge-to-cloud offloading to mitigate 99th-percentile (P99) delays. Our analytic model decomposes end-to-end latency into processing, network, and queuing components, expressing inference latency as an affine power-law function of instance utilization. Once calibrated, it produces two complementary functions that drive:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIoT and Edge/Fog Computing · Robotics and Automated Systems · Software-Defined Networks and 5G