RAG in the Wild: On the (In)effectiveness of LLMs with Mixture-of-Knowledge Retrieval Augmentation

Ran Xu; Yuchen Zhuang; Yue Yu; Haoyu Wang; Wenqi Shi; Carl Yang

arXiv:2507.20059·cs.CL·July 29, 2025

RAG in the Wild: On the (In)effectiveness of LLMs with Mixture-of-Knowledge Retrieval Augmentation

Ran Xu, Yuchen Zhuang, Yue Yu, Haoyu Wang, Wenqi Shi, Carl Yang

PDF

Open Access

TL;DR

This paper critically evaluates retrieval-augmented generation (RAG) with large language models in realistic, diverse knowledge scenarios, revealing limitations in retrieval strategies and model routing across heterogeneous sources.

Contribution

It provides the first large-scale analysis of RAG effectiveness in real-world, diverse knowledge environments, highlighting key limitations and challenges.

Findings

01

Retrieval benefits smaller models more significantly.

02

Rerankers add minimal value in current setups.

03

No single knowledge source consistently outperforms others.

Abstract

Retrieval-augmented generation (RAG) enhances large language models (LLMs) by integrating external knowledge retrieved at inference time. While RAG demonstrates strong performance on benchmarks largely derived from general-domain corpora like Wikipedia, its effectiveness under realistic, diverse retrieval scenarios remains underexplored. We evaluated RAG systems using MassiveDS, a large-scale datastore with mixture of knowledge, and identified critical limitations: retrieval mainly benefits smaller models, rerankers add minimal value, and no single retrieval source consistently excels. Moreover, current LLMs struggle to route queries across heterogeneous knowledge sources. These findings highlight the need for adaptive retrieval strategies before deploying RAG in real-world settings. Our code and data can be found at https://github.com/ritaranx/RAG_in_the_Wild.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Information Retrieval and Search Behavior