Progressive Query Expansion for Retrieval Over Cost-constrained Data Sources
Muhammad Shihab Rashid, Jannat Ara Meem, Yue Dong, Vagelis Hristidis

TL;DR
This paper introduces ProQE, a progressive query expansion method combining pseudo-relevance feedback and large language models, which iteratively improves retrieval accuracy while minimizing costs in API-restricted data sources.
Contribution
ProQE is a novel iterative query expansion algorithm that effectively combines PRF and LLMs, optimizing retrieval performance and cost-efficiency in cost-constrained data environments.
Findings
ProQE outperforms state-of-the-art baselines by 37%.
ProQE is the most cost-effective method tested.
ProQE is compatible with both sparse and dense retrieval systems.
Abstract
Query expansion has been employed for a long time to improve the accuracy of query retrievers. Earlier works relied on pseudo-relevance feedback (PRF) techniques, which augment a query with terms extracted from documents retrieved in a first stage. However, the documents may be noisy hindering the effectiveness of the ranking. To avoid this, recent studies have instead used Large Language Models (LLMs) to generate additional content to expand a query. These techniques are prone to hallucination and also focus on the LLM usage cost. However, the cost may be dominated by the retrieval in several important practical scenarios, where the corpus is only available via APIs which charge a fee per retrieved document. We propose combining classic PRF techniques with LLMs and create a progressive query expansion algorithm ProQE that iteratively expands the query as it retrieves more documents.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Web Data Mining and Analysis
MethodsFocus
