Large Language Models for Information Retrieval: A Survey

Yutao Zhu; Huaying Yuan; Shuting Wang; Jiongnan Liu; Wenhan Liu; Chenlong Deng; Haonan Chen; Zheng Liu; Zhicheng Dou; and Ji-Rong Wen

arXiv:2308.07107·cs.CL·September 18, 2025·95 cites

Large Language Models for Information Retrieval: A Survey

Yutao Zhu, Huaying Yuan, Shuting Wang, Jiongnan Liu, Wenhan Liu, Chenlong Deng, Haonan Chen, Zheng Liu, Zhicheng Dou, and Ji-Rong Wen

PDF

Open Access 1 Repo

TL;DR

This survey reviews how large language models are transforming information retrieval systems by enhancing understanding and generation, while addressing challenges like data scarcity and interpretability, and exploring future directions such as search agents.

Contribution

It provides a comprehensive overview of integrating large language models into IR, highlighting recent methodologies, challenges, and promising future research directions.

Findings

01

LLMs significantly improve semantic understanding in IR.

02

Hybrid approaches combine traditional and neural methods effectively.

03

Emerging search agents offer new capabilities for IR systems.

Abstract

As a primary means of information acquisition, information retrieval (IR) systems, such as search engines, have integrated themselves into our daily lives. These systems also serve as components of dialogue, question-answering, and recommender systems. The trajectory of IR has evolved dynamically from its origins in term-based methods to its integration with advanced neural models. While the neural models excel at capturing complex contextual signals and semantic nuances, thereby reshaping the IR landscape, they still face challenges such as data scarcity, interpretability, and the generation of contextually plausible yet potentially inaccurate responses. This evolution requires a combination of both traditional methods (such as term-based sparse retrieval methods with rapid response) and modern neural architectures (such as language models with powerful language understanding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ruc-nlpir/llm4ir-survey
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education

MethodsMulti-Head Attention · Attention Is All You Need · Adam · Softmax · Label Smoothing · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Linear Layer · Residual Connection