Resource-Efficient Generative AI Model Deployment in Mobile Edge Networks
Yuxin Liang, Peng Yang, Yuanyuan He, and Feng Lyu

TL;DR
This paper addresses the challenge of deploying resource-intensive generative AI models on mobile edge networks by proposing a collaborative edge-cloud framework and an optimization algorithm to balance resource use and delay.
Contribution
It introduces a novel resource-aware deployment framework and decision algorithm that optimize generative AI model placement on edge devices considering multidimensional resource constraints.
Findings
The proposed algorithm reduces deployment costs compared to baselines.
Model switching delay significantly impacts deployment efficiency.
Resource sharing improves overall system performance.
Abstract
The surging development of Artificial Intelligence-Generated Content (AIGC) marks a transformative era of the content creation and production. Edge servers promise attractive benefits, e.g., reduced service delay and backhaul traffic load, for hosting AIGC services compared to cloud-based solutions. However, the scarcity of available resources on the edge pose significant challenges in deploying generative AI models. In this paper, by characterizing the resource and delay demands of typical generative AI models, we find that the consumption of storage and GPU memory, as well as the model switching delay represented by I/O delay during the preloading phase, are significant and vary across models. These multidimensional coupling factors render it difficult to make efficient edge model deployment decisions. Hence, we present a collaborative edge-cloud framework aiming to properly manage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIoT and Edge/Fog Computing · Context-Aware Activity Recognition Systems
Methodstravel james
