CoEdge-RAG: Optimizing Hierarchical Scheduling for Retrieval-Augmented LLMs in Collaborative Edge Computing
Guihang Hong, Tao Ouyang, Kongyange Zhao, Zhi Zhou, Xu Chen

TL;DR
CoEdge-RAG introduces a hierarchical scheduling framework for collaborative edge computing that enhances retrieval-augmented LLM performance by optimizing cross-node data sharing and resource allocation.
Contribution
It presents a novel hierarchical scheduling framework with online query identification and adaptive resource management for collaborative edge LLMs.
Findings
Achieves up to 91.39% performance improvement over baselines.
Effectively balances workloads across heterogeneous edge nodes.
Enhances retrieval-augmented LLMs in real-time, resource-constrained environments.
Abstract
Motivated by the imperative for real-time responsiveness and data privacy preservation, large language models (LLMs) are increasingly deployed on resource-constrained edge devices to enable localized inference. To improve output quality, retrieval-augmented generation (RAG) is an efficient technique that seamlessly integrates local data into LLMs. However, existing edge computing paradigms primarily focus on single-node optimization, neglecting opportunities to holistically exploit distributed data and heterogeneous resources through cross-node collaboration. To bridge this gap, we propose CoEdge-RAG, a hierarchical scheduling framework for retrieval-augmented LLMs in collaborative edge computing. In general, privacy constraints preclude accurate a priori acquisition of heterogeneous data distributions across edge nodes, directly impeding RAG performance optimization. Thus, we first…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Digital Economy · Advanced Neural Network Applications · Caching and Content Delivery
