ExpandR: Teaching Dense Retrievers Beyond Queries with LLM Guidance
Sijia Yao, Pengcheng Huang, Zhenghao Liu, Yu Gu, Yukun Yan, Shi Yu, Ge Yu

TL;DR
ExpandR introduces a unified framework that jointly trains an LLM and a dense retriever, improving retrieval accuracy by optimizing query expansion and model alignment simultaneously.
Contribution
It proposes a novel joint optimization approach for LLM-guided dense retrieval, aligning generation and ranking objectives for better performance.
Findings
Achieves over 5% improvement in retrieval benchmarks.
Demonstrates effective mutual adaptation between LLM and retriever.
Outperforms strong baseline methods.
Abstract
Large language models (LLMs) have demonstrated significant potential in enhancing dense retrieval through query augmentation. However, most existing methods treat the LLM and the retriever as separate modules, overlooking the alignment between generation and ranking objectives. In this work, we propose ExpandR, a unified LLM-augmented dense retrieval framework that jointly optimizes both the LLM and the retriever. ExpandR employs the LLM to generate semantically rich query expansions, which are leveraged to enhance the retriever's training. Simultaneously, the LLM is trained using Direct Preference Optimization (DPO), guided by a carefully designed reward function that balances retrieval effectiveness and generation consistency. This joint optimization paradigm enables mutual adaptation between the LLM and the retriever, resulting in query expansions that are both informative and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Semantic Web and Ontologies
MethodsALIGN
