LLM-Oriented Information Retrieval: A Denoising-First Perspective

Lu Dai; Liang Sun; Fanpu Cao; Ziyang Rao; Cehao Yang; Hao Liu; Hui Xiong

arXiv:2605.00505·cs.IR·May 19, 2026

LLM-Oriented Information Retrieval: A Denoising-First Perspective

Lu Dai, Liang Sun, Fanpu Cao, Ziyang Rao, Cehao Yang, Hao Liu, Hui Xiong

PDF

TL;DR

This paper emphasizes the importance of denoising in information retrieval for large language models, proposing a framework and taxonomy to improve evidence quality and verifiability across the retrieval pipeline.

Contribution

It introduces a new perspective focusing on denoising as the key challenge in LLM-oriented IR and provides a comprehensive taxonomy of related techniques.

Findings

01

Denoising maximization enhances evidence density and verifiability.

02

A four-stage IR challenge framework from inaccessible to unverifiable.

03

Survey of denoising techniques across various retrieval domains.

Abstract

Modern information retrieval (IR) is no longer consumed primarily by humans but increasingly by large language models (LLMs) via retrieval-augmented generation (RAG) and agentic search. Unlike human users, LLMs are constrained by limited attention budgets and are uniquely vulnerable to noise; misleading or irrelevant information is no longer just a nuisance, but a direct cause of hallucinations and reasoning failures. In this perspective paper, we argue that denoising-maximizing usable evidence density and verifiability within a context window-is becoming the primary bottleneck across the full information access pipeline. We conceptualize this paradigm shift through a four-stage framework of IR challenges: from inaccessible to undiscoverable, to misaligned, and finally to unverifiable. Furthermore, we provide a pipeline-organized taxonomy of signal-to-noise optimization techniques,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.