Millions of $\text{GeAR}$-s: Extending GraphRAG to Millions of Documents
Zhili Shen, Chenxin Diao, Pascual Merita, Pavlos Vougiouklis, Jeff Z. Pan

TL;DR
This paper adapts the GeAR graph-based retrieval-augmented generation method to handle millions of documents, testing its scalability and effectiveness across broader datasets beyond specific tasks.
Contribution
It extends the GeAR approach to large-scale datasets, demonstrating its potential for general-purpose retrieval-augmented generation.
Findings
Scalable to millions of documents
Effective across diverse datasets
Identifies limitations of current graph-based RAG methods
Abstract
Recent studies have explored graph-based approaches to retrieval-augmented generation, leveraging structured or semi-structured information -- such as entities and their relations extracted from documents -- to enhance retrieval. However, these methods are typically designed to address specific tasks, such as multi-hop question answering and query-focused summarisation, and therefore, there is limited evidence of their general applicability across broader datasets. In this paper, we aim to adapt a state-of-the-art graph-based RAG solution: and explore its performance and limitations on the SIGIR 2025 LiveRAG Challenge.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGraph Theory and Algorithms · Algorithms and Data Compression · Constraint Satisfaction and Optimization
