DGRAG: Distributed Graph-based Retrieval-Augmented Generation in Edge-Cloud Systems
Wenqing Zhou, Yuxuan Yan, Qianqian Yang

TL;DR
DGRAG is a distributed retrieval-augmented generation framework that enhances privacy and reduces latency in edge-cloud systems by organizing local data into knowledge graphs and selectively escalating queries to the cloud.
Contribution
It introduces a novel distributed graph-based RAG approach that balances local inference and cloud assistance, improving efficiency and privacy over centralized methods.
Findings
Outperforms decentralized baselines in distributed question answering
Reduces cloud overhead significantly
Maintains high factuality and response quality
Abstract
Retrieval-Augmented Generation (RAG) improves factuality by grounding LLMs in external knowledge, yet conventional centralized RAG requires aggregating distributed data, raising privacy risks and incurring high retrieval latency and cost. We present DGRAG, a distributed graph-driven RAG framework for edge-cloud collaborative systems. Each edge device organizes local documents into a knowledge graph and periodically uploads subgraph-level summaries to the cloud for lightweight global indexing without exposing raw data. At inference time, queries are first answered on the edge; a gate mechanism assesses the confidence and consistency of multiple local generations to decide whether to return a local answer or escalate the query. For escalated queries, the cloud performs summary-based matching to identify relevant edges, retrieves supporting evidence from them, and generates the final…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCaching and Content Delivery · Recommender Systems and Techniques · Complex Network Analysis Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Attention Dropout · Softmax · WordPiece · Weight Decay · Multi-Head Attention · Layer Normalization · Byte Pair Encoding
