Resource Selection for Federated Search on the Web

Dong Nguyen; Thomas Demeester; Dolf Trieschnigg; Djoerd Hiemstra

arXiv:1609.04556·cs.IR·September 16, 2016·2 cites

Resource Selection for Federated Search on the Web

Dong Nguyen, Thomas Demeester, Dolf Trieschnigg, Djoerd Hiemstra

PDF

Open Access

TL;DR

This paper introduces a new dataset and evaluates resource selection methods for federated web search, highlighting challenges like sparse descriptions and skewed collection sizes, and proposing size estimation techniques.

Contribution

It provides a new web federated search dataset, analyzes resource size estimation, and compares resource selection methods under real web conditions.

Findings

01

Size estimation improves resource selection accuracy

02

Smaller web search engines can effectively replace larger ones

03

Existing resource selection methods face challenges due to data sparsity and skewed sizes

Abstract

A publicly available dataset for federated search reflecting a real web environment has long been absent, making it difficult for researchers to test the validity of their federated search algorithms for the web setting. We present several experiments and analyses on resource selection on the web using a recently released test collection containing the results from more than a hundred real search engines, ranging from large general web search engines such as Google, Bing and Yahoo to small domain-specific engines. First, we experiment with estimating the size of uncooperative search engines on the web using query based sampling and propose a new method using the ClueWeb09 dataset. We find the size estimates to be highly effective in resource selection. Second, we show that an optimized federated search system based on smaller web search engines can be an alternative to a system using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Data Mining and Analysis · Caching and Content Delivery · Data Management and Algorithms