TCDE: Topic-Centric Dual Expansion of Queries and Documents with Large Language Models for Information Retrieval
Yu Yang, Feng Tian, Ping Chen

TL;DR
This paper introduces TCDE, a novel topic-centric dual expansion method using large language models to improve semantic alignment between queries and documents in information retrieval tasks.
Contribution
We propose a dual expansion strategy with LLM-guided prompts for both queries and documents, enhancing semantic alignment and retrieval performance.
Findings
Significant improvements on TREC Deep Learning and BEIR benchmarks.
Outperforms state-of-the-art expansion baselines.
Achieves 2.8% relative gain in NDCG@10 on SciFact.
Abstract
Query Expansion (QE) enriches queries and Document Expansion (DE) enriches documents, and these two techniques are often applied separately. However, such separate application may lead to semantic misalignment between the expanded queries (or documents) and their relevant documents (or queries). To address this serious issue, we propose TCDE, a dual expansion strategy that leverages large language models (LLMs) for topic-centric enrichment on both queries and documents. In TCDE, we design two distinct prompt templates for processing each query and document. On the query side, an LLM is guided to identify distinct sub-topics within each query and generate a focused pseudo-document for each sub-topic. On the document side, an LLM is guided to distill each document into a set of core topic sentences. The resulting outputs are used to expand the original query and document. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Advanced Graph Neural Networks
