How Much Reasoning Do Retrieval-Augmented Models Add beyond LLMs? A Benchmarking Framework for Multi-Hop Inference over Hybrid Knowledge

Junhong Lin; Bing Zhang; Song Wang; Ziyan Liu; Dan Gutfreund; Julian Shun; Yada Zhu

arXiv:2602.10210·cs.LG·February 12, 2026

How Much Reasoning Do Retrieval-Augmented Models Add beyond LLMs? A Benchmarking Framework for Multi-Hop Inference over Hybrid Knowledge

Junhong Lin, Bing Zhang, Song Wang, Ziyan Liu, Dan Gutfreund, Julian Shun, Yada Zhu

PDF

Open Access

TL;DR

This paper introduces HybridRAG-Bench, a benchmarking framework designed to evaluate retrieval and multi-hop reasoning capabilities of models over hybrid knowledge sources, addressing challenges of data contamination and genuine reasoning assessment.

Contribution

The paper presents HybridRAG-Bench, a novel, flexible benchmark framework for evaluating retrieval-augmented models on multi-hop reasoning over hybrid knowledge, with contamination-aware features.

Findings

01

HybridRAG-Bench effectively distinguishes genuine retrieval from parametric recall.

02

Experiments show models improve reasoning over hybrid knowledge sources.

03

Framework supports domain-specific and time-aware evaluations.

Abstract

Large language models (LLMs) continue to struggle with knowledge-intensive questions that require up-to-date information and multi-hop reasoning. Augmenting LLMs with hybrid external knowledge, such as unstructured text and structured knowledge graphs, offers a promising alternative to costly continual pretraining. As such, reliable evaluation of their retrieval and reasoning capabilities becomes critical. However, many existing benchmarks increasingly overlap with LLM pretraining data, which means answers or supporting knowledge may already be encoded in model parameters, making it difficult to distinguish genuine retrieval and reasoning from parametric recall. We introduce HybridRAG-Bench, a framework for constructing benchmarks to evaluate retrieval-intensive, multi-hop reasoning over hybrid knowledge. HybridRAG-Bench automatically couples unstructured text and structured knowledge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Topic Modeling · Multimodal Machine Learning Applications