Cloud Is Closer Than It Appears: Revisiting the Tradeoffs of Distributed Real-Time Inference
Pragya Sharma, Hang Qiu, Mani Srivastava

TL;DR
This paper challenges the assumption that cloud-based inference is unsuitable for real-time control in cyber-physical systems, showing that with high-throughput resources, cloud inference can meet or exceed on-device performance.
Contribution
It introduces a formal model for distributed inference latency and demonstrates conditions where cloud inference outperforms on-device solutions in safety-critical tasks.
Findings
Cloud platforms can match or surpass on-device inference latency with sufficient throughput.
The model accurately predicts latency based on sensing frequency, network delay, and platform throughput.
Simulations in autonomous driving scenarios validate the model's effectiveness.
Abstract
The increasing deployment of deep neural networks (DNNs) in cyber-physical systems (CPS) enhances perception fidelity, but imposes substantial computational demands on execution platforms, posing challenges to real-time control deadlines. Traditional distributed CPS architectures typically favor on-device inference to avoid network variability and contention-induced delays on remote platforms. However, this design choice places significant energy and computational demands on the local hardware. In this work, we revisit the assumption that cloud-based inference is intrinsically unsuitable for latency-sensitive control tasks. We demonstrate that, when provisioned with high-throughput compute resources, cloud platforms can effectively amortize network and queueing delays, enabling them to match or surpass on-device performance for real-time decision-making. Specifically, we develop a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
