What Breaks Knowledge Graph based RAG? Benchmarking and Empirical Insights into Reasoning under Incomplete Knowledge

Dongzhuoran Zhou; Yuqicheng Zhu; Xiaxia Wang; Hongkuan Zhou; Yuan He; Jiaoyan Chen; Steffen Staab; Evgeny Kharlamov

arXiv:2508.08344·cs.AI·January 13, 2026

What Breaks Knowledge Graph based RAG? Benchmarking and Empirical Insights into Reasoning under Incomplete Knowledge

Dongzhuoran Zhou, Yuqicheng Zhu, Xiaxia Wang, Hongkuan Zhou, Yuan He, Jiaoyan Chen, Steffen Staab, Evgeny Kharlamov

PDF

Open Access 3 Datasets

TL;DR

This paper introduces BRINK, a benchmark to evaluate the reasoning capabilities of knowledge graph-based retrieval-augmented generation methods under incomplete knowledge, revealing their limited reasoning ability and reliance on memorization.

Contribution

The work presents a novel benchmark, BRINK, for systematically assessing KG-RAG methods' reasoning under incomplete knowledge and provides empirical insights into their limitations.

Findings

01

Current KG-RAG methods have limited reasoning ability.

02

Models often rely on internal memorization rather than reasoning.

03

Performance varies depending on model design and knowledge incompleteness.

Abstract

Knowledge Graph-based Retrieval-Augmented Generation (KG-RAG) is an increasingly explored approach for combining the reasoning capabilities of large language models with the structured evidence of knowledge graphs. However, current evaluation practices fall short: existing benchmarks often include questions that can be directly answered using existing triples in KG, making it unclear whether models perform reasoning or simply retrieve answers directly. Moreover, inconsistent evaluation metrics and lenient answer matching criteria further obscure meaningful comparisons. In this work, we introduce a general method for constructing benchmarks and present BRINK (Benchmark for Reasoning under Incomplete Knowledge) to systematically assess KG-RAG methods under knowledge incompleteness. Our empirical results show that current KG-RAG methods have limited reasoning ability under missing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Topic Modeling · Multimodal Machine Learning Applications