TL;DR
This paper evaluates a web-based data enrichment method for pseudo-relevance feedback in information retrieval, analyzing its robustness and effectiveness across different search engines, queries, and test collections.
Contribution
It provides a comprehensive analysis of web content-based data enrichment for relevance feedback, extending prior work with systematic experiments on system performance over time.
Findings
The method is robust across various conditions.
Web content enrichment improves retrieval performance.
Performance varies with search engine and query type.
Abstract
In this work, we analyze a pseudo-relevance retrieval method based on the results of web search engines. By enriching topics with text data from web search engine result pages and linked contents, we train topic-specific and cost-efficient classifiers that can be used to search test collections for relevant documents. Building upon attempts initially made at TREC Common Core 2018 by Grossman and Cormack, we address questions of system performance over time considering different search engines, queries, and test collections. Our experimental results show how and to which extent the considered components affect the retrieval performance. Overall, the analyzed method is robust in terms of average retrieval performance and a promising way to use web content for the data enrichment of relevance feedback methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
