TL;DR
This paper introduces GraphRAG, a graph-based method that enhances question answering and summarization over large text corpora by combining entity graphs and community summaries, improving answer quality for global questions.
Contribution
GraphRAG is a novel graph-based approach that scales retrieval-augmented generation to large datasets and improves global question answering by integrating entity graphs and community summaries.
Findings
Significant improvements over RAG baseline in answer comprehensiveness.
Enhanced diversity in generated answers.
Effective handling of datasets with over 1 million tokens.
Abstract
The use of retrieval-augmented generation (RAG) to retrieve relevant information from an external knowledge source enables large language models (LLMs) to answer questions over private and/or previously unseen document collections. However, RAG fails on global questions directed at an entire text corpus, such as "What are the main themes in the dataset?", since this is inherently a query-focused summarization (QFS) task, rather than an explicit retrieval task. Prior QFS methods, meanwhile, do not scale to the quantities of text indexed by typical RAG systems. To combine the strengths of these contrasting methods, we propose GraphRAG, a graph-based approach to question answering over private text corpora that scales with both the generality of user questions and the quantity of source text. Our approach uses an LLM to build a graph index in two stages: first, to derive an entity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Weight Decay · Byte Pair Encoding · Linear Layer · Adam · Linear Warmup With Linear Decay · Layer Normalization · Multi-Head Attention · Dropout
