From Local to Global: A Graph RAG Approach to Query-Focused   Summarization

Darren Edge; Ha Trinh; Newman Cheng; Joshua Bradley; Alex Chao; Apurva; Mody; Steven Truitt; Dasha Metropolitansky; Robert Osazuwa Ness; Jonathan; Larson

arXiv:2404.16130·cs.CL·February 20, 2025

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva, Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, Jonathan, Larson

PDF

3 Repos

TL;DR

This paper introduces GraphRAG, a graph-based method that enhances question answering and summarization over large text corpora by combining entity graphs and community summaries, improving answer quality for global questions.

Contribution

GraphRAG is a novel graph-based approach that scales retrieval-augmented generation to large datasets and improves global question answering by integrating entity graphs and community summaries.

Findings

01

Significant improvements over RAG baseline in answer comprehensiveness.

02

Enhanced diversity in generated answers.

03

Effective handling of datasets with over 1 million tokens.

Abstract

The use of retrieval-augmented generation (RAG) to retrieve relevant information from an external knowledge source enables large language models (LLMs) to answer questions over private and/or previously unseen document collections. However, RAG fails on global questions directed at an entire text corpus, such as "What are the main themes in the dataset?", since this is inherently a query-focused summarization (QFS) task, rather than an explicit retrieval task. Prior QFS methods, meanwhile, do not scale to the quantities of text indexed by typical RAG systems. To combine the strengths of these contrasting methods, we propose GraphRAG, a graph-based approach to question answering over private text corpora that scales with both the generality of user questions and the quantity of source text. Our approach uses an LLM to build a graph index in two stages: first, to derive an entity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Weight Decay · Byte Pair Encoding · Linear Layer · Adam · Linear Warmup With Linear Decay · Layer Normalization · Multi-Head Attention · Dropout