CoEdge-RAG: Optimizing Hierarchical Scheduling for Retrieval-Augmented LLMs in Collaborative Edge Computing

Guihang Hong; Tao Ouyang; Kongyange Zhao; Zhi Zhou; Xu Chen

arXiv:2511.05915·cs.DC·November 11, 2025

CoEdge-RAG: Optimizing Hierarchical Scheduling for Retrieval-Augmented LLMs in Collaborative Edge Computing

Guihang Hong, Tao Ouyang, Kongyange Zhao, Zhi Zhou, Xu Chen

PDF

Open Access

TL;DR

CoEdge-RAG introduces a hierarchical scheduling framework for collaborative edge computing that enhances retrieval-augmented LLM performance by optimizing cross-node data sharing and resource allocation.

Contribution

It presents a novel hierarchical scheduling framework with online query identification and adaptive resource management for collaborative edge LLMs.

Findings

01

Achieves up to 91.39% performance improvement over baselines.

02

Effectively balances workloads across heterogeneous edge nodes.

03

Enhances retrieval-augmented LLMs in real-time, resource-constrained environments.

Abstract

Motivated by the imperative for real-time responsiveness and data privacy preservation, large language models (LLMs) are increasingly deployed on resource-constrained edge devices to enable localized inference. To improve output quality, retrieval-augmented generation (RAG) is an efficient technique that seamlessly integrates local data into LLMs. However, existing edge computing paradigms primarily focus on single-node optimization, neglecting opportunities to holistically exploit distributed data and heterogeneous resources through cross-node collaboration. To bridge this gap, we propose CoEdge-RAG, a hierarchical scheduling framework for retrieval-augmented LLMs in collaborative edge computing. In general, privacy constraints preclude accurate a priori acquisition of heterogeneous data distributions across edge nodes, directly impeding RAG performance optimization. Thus, we first…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Digital Economy · Advanced Neural Network Applications · Caching and Content Delivery