Document Selection in a Distributed Search Engine Architecture
Ibrahim AlShourbaji, Samaher Al-Janabi, Ahmed Patel

TL;DR
This paper explores methods for selecting optimal databases in a distributed search engine to improve search efficiency and quality by reducing data scope and minimizing biases.
Contribution
It evaluates the effectiveness of various database selection strategies within a distributed search engine framework to enhance information retrieval performance.
Findings
Database selection improves search speed and reduces resource usage.
Using a selection index captures broad database information effectively.
Optimal database choices enhance retrieval quality and reduce biases.
Abstract
Distributed Search Engine Architecture (DSEA) hosts numerous independent topic-specific search engines and selects a subset of the databases to search within the architecture. The objective of this approach is to reduce the amount of space needed to perform a search by querying only a subset of the total data available. In order to manipulate data across many databases, it is most efficient to identify a smaller subset of databases that would be most likely to return the data of specific interest that can then be examined in greater detail. The selection index has been most commonly used as a method for choosing the most applicable databases as it captures broad information about each database and its indexed documents. Employing this type of database allows the researcher to find information more quickly, not only with less cost, but it also minimizes the potential for biases. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Data Mining Algorithms and Applications
