Joint Foundation Model Caching and Inference of Generative AI Services   for Edge Intelligence

Minrui Xu; Dusit Niyato; Hongliang Zhang; Jiawen Kang; Zehui Xiong,; Shiwen Mao; and Zhu Han

arXiv:2305.12130·cs.NI·May 23, 2023·1 cites

Joint Foundation Model Caching and Inference of Generative AI Services for Edge Intelligence

Minrui Xu, Dusit Niyato, Hongliang Zhang, Jiawen Kang, Zehui Xiong,, Shiwen Mao, and Zhu Han

PDF

Open Access

TL;DR

This paper introduces a joint caching and inference framework for generative AI services at edge servers, optimizing resource use and latency by managing foundation models with a novel context-aware metric.

Contribution

It proposes a new framework and a novel Age of Context metric to efficiently cache and manage foundation models for edge AI, balancing latency, accuracy, and resource constraints.

Findings

01

The proposed least context caching algorithm reduces system costs.

02

Utilizing contextual information improves caching efficiency.

03

Numerical results show significant cost savings over baselines.

Abstract

With the rapid development of artificial general intelligence (AGI), various multimedia services based on pretrained foundation models (PFMs) need to be effectively deployed. With edge servers that have cloud-level computing power, edge intelligence can extend the capabilities of AGI to mobile edge networks. However, compared with cloud data centers, resource-limited edge servers can only cache and execute a small number of PFMs, which typically consist of billions of parameters and require intensive computing power and GPU memory during inference. To address this challenge, in this paper, we propose a joint foundation model caching and inference framework that aims to balance the tradeoff among inference latency, accuracy, and resource consumption by managing cached PFMs and user requests efficiently during the provisioning of generative AI services. Specifically, considering the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAge of Information Optimization · Caching and Content Delivery · IoT and Edge/Fog Computing