SLIDE: Simultaneous Model Downloading and Inference at the Wireless Network Edge
Guanqiao Qu, Tao Li, Qian Chen, Xianhao Chen, Sheng Zhou

TL;DR
SLIDE enables real-time inference at the wireless network edge by allowing simultaneous model downloading and inference, optimizing resource allocation for improved throughput.
Contribution
The paper introduces SLIDE, a novel framework that enables concurrent model downloading and inference, with an efficient algorithm for resource optimization in multi-user wireless systems.
Findings
SLIDE significantly improves task throughput compared to traditional schemes.
The recursive dependency model captures the latency impact of layer-wise downloading and inference.
Proposed algorithm finds optimal resource allocation with polynomial complexity.
Abstract
To support on-device inference, the next-generation mobile networks are expected to support real-time model downloading services to mobile users. However, powerful AI models typically have large model sizes, resulting in excessive end-to-end (E2E) downloading-and-inference (DAI) latency. To address this issue, we propose a simultaneous model downloading and inference (SLIDE) framework, which allows users to perform inference with downloaded layers while simultaneously receiving the remaining layers of the model. To this end, we formulate a task throughput maximization problem by jointly optimizing model provisioning, spectrum bandwidth allocation, and computing resource allocation for multi-user downlink systems. Unlike traditional DAI frameworks, SLIDE introduces recursive dependencies across layers, where inference latency depends recursively on the downloading bandwidth and computing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
