Millions of $\text{GeAR}$-s: Extending GraphRAG to Millions of Documents

Zhili Shen; Chenxin Diao; Pascual Merita; Pavlos Vougiouklis; Jeff Z. Pan

arXiv:2507.17399·cs.CL·July 24, 2025

Millions of $\text{GeAR}$-s: Extending GraphRAG to Millions of Documents

Zhili Shen, Chenxin Diao, Pascual Merita, Pavlos Vougiouklis, Jeff Z. Pan

PDF

Open Access 1 Datasets

TL;DR

This paper adapts the GeAR graph-based retrieval-augmented generation method to handle millions of documents, testing its scalability and effectiveness across broader datasets beyond specific tasks.

Contribution

It extends the GeAR approach to large-scale datasets, demonstrating its potential for general-purpose retrieval-augmented generation.

Findings

01

Scalable to millions of documents

02

Effective across diverse datasets

03

Identifies limitations of current graph-based RAG methods

Abstract

Recent studies have explored graph-based approaches to retrieval-augmented generation, leveraging structured or semi-structured information -- such as entities and their relations extracted from documents -- to enhance retrieval. However, these methods are typically designed to address specific tasks, such as multi-hop question answering and query-focused summarisation, and therefore, there is limited evidence of their general applicability across broader datasets. In this paper, we aim to adapt a state-of-the-art graph-based RAG solution: $GeAR$ and explore its performance and limitations on the SIGIR 2025 LiveRAG Challenge.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

LiveRAG/Reports
dataset· 273 dl
273 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGraph Theory and Algorithms · Algorithms and Data Compression · Constraint Satisfaction and Optimization