Joint Foundation Model Caching and Inference of Generative AI Services for Edge Intelligence
Minrui Xu, Dusit Niyato, Hongliang Zhang, Jiawen Kang, Zehui Xiong,, Shiwen Mao, and Zhu Han

TL;DR
This paper introduces a joint caching and inference framework for generative AI services at edge servers, optimizing resource use and latency by managing foundation models with a novel context-aware metric.
Contribution
It proposes a new framework and a novel Age of Context metric to efficiently cache and manage foundation models for edge AI, balancing latency, accuracy, and resource constraints.
Findings
The proposed least context caching algorithm reduces system costs.
Utilizing contextual information improves caching efficiency.
Numerical results show significant cost savings over baselines.
Abstract
With the rapid development of artificial general intelligence (AGI), various multimedia services based on pretrained foundation models (PFMs) need to be effectively deployed. With edge servers that have cloud-level computing power, edge intelligence can extend the capabilities of AGI to mobile edge networks. However, compared with cloud data centers, resource-limited edge servers can only cache and execute a small number of PFMs, which typically consist of billions of parameters and require intensive computing power and GPU memory during inference. To address this challenge, in this paper, we propose a joint foundation model caching and inference framework that aims to balance the tradeoff among inference latency, accuracy, and resource consumption by managing cached PFMs and user requests efficiently during the provisioning of generative AI services. Specifically, considering the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAge of Information Optimization · Caching and Content Delivery · IoT and Edge/Fog Computing
