DGRAG: Distributed Graph-based Retrieval-Augmented Generation in Edge-Cloud Systems

Wenqing Zhou; Yuxuan Yan; Qianqian Yang

arXiv:2505.19847·cs.AI·January 29, 2026

DGRAG: Distributed Graph-based Retrieval-Augmented Generation in Edge-Cloud Systems

Wenqing Zhou, Yuxuan Yan, Qianqian Yang

PDF

Open Access

TL;DR

DGRAG is a distributed retrieval-augmented generation framework that enhances privacy and reduces latency in edge-cloud systems by organizing local data into knowledge graphs and selectively escalating queries to the cloud.

Contribution

It introduces a novel distributed graph-based RAG approach that balances local inference and cloud assistance, improving efficiency and privacy over centralized methods.

Findings

01

Outperforms decentralized baselines in distributed question answering

02

Reduces cloud overhead significantly

03

Maintains high factuality and response quality

Abstract

Retrieval-Augmented Generation (RAG) improves factuality by grounding LLMs in external knowledge, yet conventional centralized RAG requires aggregating distributed data, raising privacy risks and incurring high retrieval latency and cost. We present DGRAG, a distributed graph-driven RAG framework for edge-cloud collaborative systems. Each edge device organizes local documents into a knowledge graph and periodically uploads subgraph-level summaries to the cloud for lightweight global indexing without exposing raw data. At inference time, queries are first answered on the edge; a gate mechanism assesses the confidence and consistency of multiple local generations to decide whether to return a local answer or escalate the query. For escalated queries, the cloud performs summary-based matching to identify relevant edges, retrieves supporting evidence from them, and generates the final…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCaching and Content Delivery · Recommender Systems and Techniques · Complex Network Analysis Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Attention Dropout · Softmax · WordPiece · Weight Decay · Multi-Head Attention · Layer Normalization · Byte Pair Encoding