Streamlined Data Fusion: Unleashing the Power of Linear Combination with Minimal Relevance Judgments
Qiuyu Xu, Yidong Huang, Shengli Wu, Adrian Moore

TL;DR
This paper demonstrates that near-optimal data fusion weights can be achieved with only 20-50% of relevant documents, significantly reducing the effort needed for relevance judgments in information retrieval.
Contribution
It introduces a method to train effective linear combination weights using minimal relevance judgments, making data fusion more practical and cost-effective.
Findings
Weights trained with 20-50% relevance data closely match full-data results.
Linear regression effectively estimates optimal fusion weights with limited relevance judgments.
Reduced relevance data still yields high-quality data fusion performance.
Abstract
Linear combination is a potent data fusion method in information retrieval tasks, thanks to its ability to adjust weights for diverse scenarios. However, achieving optimal weight training has traditionally required manual relevance judgments on a large percentage of documents, a labor-intensive and expensive process. In this study, we investigate the feasibility of obtaining near-optimal weights using a mere 20\%-50\% of relevant documents. Through experiments on four TREC datasets, we find that weights trained with multiple linear regression using this reduced set closely rival those obtained with TREC's official "qrels." Our findings unlock the potential for more efficient and affordable data fusion, empowering researchers and practitioners to reap its full benefits with significantly less effort.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Text and Document Classification Technologies · Information Retrieval and Search Behavior
MethodsLinear Regression
