Multi-Agentic AI for Fairness-Aware and Accelerated Multi-modal Large Model Inference in Real-world Mobile Edge Networks
Haiyuan Li, Hari Madhukumar, Shuangyi Yan, Yulei Wu, Dimitra Simeonidou

TL;DR
This paper introduces a multi-agent AI framework for efficient, fair, and low-latency multi-modal large model inference in mobile edge networks, addressing resource heterogeneity and privacy concerns.
Contribution
It proposes a cooperative multi-agent system utilizing foundation models for optimized prompt routing and deployment in resource-constrained edge environments.
Findings
Reduces average latency by over 80%
Achieves fairness with a normalized Jain index of 0.90
Adapts quickly without fine-tuning
Abstract
Generative AI (GenAI) has transformed applications in natural language processing and content creation, yet centralized inference remains hindered by high latency, limited customizability, and privacy concerns. Deploying large models (LMs) in mobile edge networks emerges as a promising solution. However, it also poses new challenges, including heterogeneous multi-modal LMs with diverse resource demands and inference speeds, varied prompt/output modalities that complicate orchestration, and resource-limited infrastructure ill-suited for concurrent LM execution. In response, we propose a Multi-Agentic AI framework for latency- and fairness-aware multi-modal LM inference in mobile edge networks. Our solution includes a long-term planning agent, a short-term prompt scheduling agent, and multiple on-node LM deployment agents, all powered by foundation language models. These agents…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Digital Economy · Advanced Neural Network Applications · Mobile Crowdsensing and Crowdsourcing
