WildGraphBench: Benchmarking GraphRAG with Wild-Source Corpora

Pengyu Wang; Benfeng Xu; Licheng Zhang; Shaohan Wang; Mingxuan Du; Chiwei Zhu; Zhendong Mao

arXiv:2602.02053·cs.CL·February 4, 2026

WildGraphBench: Benchmarking GraphRAG with Wild-Source Corpora

Pengyu Wang, Benfeng Xu, Licheng Zhang, Shaohan Wang, Mingxuan Du, Chiwei Zhu, Zhendong Mao

PDF

Open Access 1 Datasets

TL;DR

WildGraphBench introduces a realistic benchmark for GraphRAG systems using Wikipedia's long, heterogeneous documents, revealing strengths in multi-fact retrieval but challenges in detailed summarization.

Contribution

This work presents WildGraphBench, a novel benchmark that evaluates GraphRAG performance on long, complex, real-world documents, addressing limitations of previous short-passage benchmarks.

Findings

01

GraphRAG aids multi-fact aggregation with moderate sources

02

High-level statements are overemphasized in current systems

03

Performance on summarization tasks is weaker due to focus on high-level info

Abstract

Graph-based Retrieval-Augmented Generation (GraphRAG) organizes external knowledge as a hierarchical graph, enabling efficient retrieval and aggregation of scattered evidence across multiple documents. However, many existing benchmarks for GraphRAG rely on short, curated passages as external knowledge, failing to adequately evaluate systems in realistic settings involving long contexts and large-scale heterogeneous documents. To bridge this gap, we introduce WildGraphBench, a benchmark designed to assess GraphRAG performance in the wild. We leverage Wikipedia's unique structure, where cohesive narratives are grounded in long and heterogeneous external reference documents, to construct a benchmark reflecting real-word scenarios. Specifically, we sample articles across 12 top-level topics, using their external references as the retrieval corpus and citation-linked statements as ground…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Bstwpy/WildGraphBench
dataset· 3.2k dl
3.2k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Information Retrieval and Search Behavior