Joint Partitioning and Placement of Foundation Models for Real-Time Edge AI
Aladin Djuhera, Fernando Koch, Alecio Binotto

TL;DR
This paper presents a dynamic framework for real-time partitioning and placement of foundation models in edge AI, enabling adaptive inference amidst fluctuating resources and requirements.
Contribution
It introduces a novel runtime-resolved partitioning and placement framework for foundation models, addressing the limitations of static configurations in volatile edge environments.
Findings
Framework enables reactive inference composition
Incorporates model-aware capacity profiling
Demonstrated in 6G multi-access edge computing use case
Abstract
Inference over large-scale foundation models within heterogeneous edge environments necessitates a fundamentally reconfigurable orchestration substrate. Static partitioning of model layers presumes temporal stability across compute and network resources, which is misaligned with the volatility of real-world deployments. We introduce a framework in which both the spatial placement and internal segmentation of foundation models are elevated to runtime-resolved constructs. The orchestration problem is formalized as a constrained optimization over layer-wise assignments, subject to evolving latency, utilization, and privacy gradients. The framework implements reactive inference composition responsive to infrastructural fluctuations by integrating model-aware capacity profiling with dynamic graph re-partitioning and reallocation. We introduce architectural and algorithmic components, along…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIoT and Edge/Fog Computing · Advanced Neural Network Applications · Software-Defined Networks and 5G
