Modular Foundation Model Inference at the Edge: Network-Aware Microservice Optimization
Juan Zhu, Zixin Wang, Shenghui Song, Jun Zhang, Khaled Ben Letaief

TL;DR
This paper presents a network-aware microservice framework for modular foundation model inference at the edge, balancing resource constraints and network variability to ensure reliable, real-time multimodal AI services.
Contribution
It introduces a two-tier deployment strategy combining static network-aware placement of core services with dynamic orchestration of light services, enhancing edge inference robustness.
Findings
Achieves over 84% average on-time task completion.
Maintains robustness under increasing system load.
Balances deployment costs with performance guarantees.
Abstract
Foundation models (FMs) unlock unprecedented multimodal and multitask intelligence, yet their cloud-centric deployment precludes real-time responsiveness and compromises user privacy. Meanwhile, monolithic execution at the edge remains infeasible under stringent resource limits and uncertain network dynamics. To bridge this gap, we propose a microservice-based FM inference framework that exploits the intrinsic functional asymmetry between heavyweight core services and agile light services. Our two-tier deployment strategy ensures robust Quality of Service (QoS) under resource contention. Specifically, core services are placed statically via a long-term network-aware integer program with sparsity constraints to form a fault-tolerant backbone. On the other hand, light services are orchestrated dynamically by a low-complexity online controller that integrates effective capacity theory with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · Software-Defined Networks and 5G
