Understanding the User: An Intent-Based Ranking Dataset
Abhijit Anand, Jurek Leonhardt, V Venktesh, Avishek Anand

TL;DR
This paper enhances web search datasets by using advanced language models and crowdsourcing to generate detailed query descriptions, improving the understanding of user intent for better evaluation of retrieval systems.
Contribution
It introduces a novel method combining LLMs and crowdsourcing to annotate query intent in benchmark datasets, enriching their utility for evaluation tasks.
Findings
Generated descriptions are validated through crowdsourcing.
Enhanced datasets provide richer context for ranking evaluation.
Method improves understanding of implicit user intent.
Abstract
As information retrieval systems continue to evolve, accurate evaluation and benchmarking of these systems become pivotal. Web search datasets, such as MS MARCO, primarily provide short keyword queries without accompanying intent or descriptions, posing a challenge in comprehending the underlying information need. This paper proposes an approach to augmenting such datasets to annotate informative query descriptions, with a focus on two prominent benchmark datasets: TREC-DL-21 and TREC-DL-22. Our methodology involves utilizing state-of-the-art LLMs to analyze and comprehend the implicit intent within individual queries from benchmark datasets. By extracting key semantic elements, we construct detailed and contextually rich descriptions for these queries. To validate the generated query descriptions, we employ crowdsourcing as a reliable means of obtaining diverse human perspectives on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques
MethodsSparse Evolutionary Training · Focus
