ExpandR: Teaching Dense Retrievers Beyond Queries with LLM Guidance

Sijia Yao; Pengcheng Huang; Zhenghao Liu; Yu Gu; Yukun Yan; Shi Yu; Ge Yu

arXiv:2502.17057·cs.IR·May 30, 2025

ExpandR: Teaching Dense Retrievers Beyond Queries with LLM Guidance

Sijia Yao, Pengcheng Huang, Zhenghao Liu, Yu Gu, Yukun Yan, Shi Yu, Ge Yu

PDF

Open Access 1 Repo 2 Models 1 Video

TL;DR

ExpandR introduces a unified framework that jointly trains an LLM and a dense retriever, improving retrieval accuracy by optimizing query expansion and model alignment simultaneously.

Contribution

It proposes a novel joint optimization approach for LLM-guided dense retrieval, aligning generation and ranking objectives for better performance.

Findings

01

Achieves over 5% improvement in retrieval benchmarks.

02

Demonstrates effective mutual adaptation between LLM and retriever.

03

Outperforms strong baseline methods.

Abstract

Large language models (LLMs) have demonstrated significant potential in enhancing dense retrieval through query augmentation. However, most existing methods treat the LLM and the retriever as separate modules, overlooking the alignment between generation and ranking objectives. In this work, we propose ExpandR, a unified LLM-augmented dense retrieval framework that jointly optimizes both the LLM and the retriever. ExpandR employs the LLM to generate semantically rich query expansions, which are leveraged to enhance the retriever's training. Simultaneously, the LLM is trained using Direct Preference Optimization (DPO), guided by a carefully designed reward function that balances retrieval effectiveness and generation consistency. This joint optimization paradigm enables mutual adaptation between the LLM and the retriever, resulting in query expansions that are both informative and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

neuir/llm-qe
pytorchOfficial

Models

Videos

ExpandR: Teaching Dense Retrievers Beyond Queries with LLM Guidance· underline

Taxonomy

TopicsInformation Retrieval and Search Behavior · Topic Modeling · Semantic Web and Ontologies

MethodsALIGN