Design of a Parallel and Distributed Web Search Engine
Salvatore Orlando (1), Raffaele Perego (2), Fabrizio Silvestri (1 and, 3) ((1) Dipartimento di Informatica, Universit\`a di Venezia - Mestre, Italy,, (2) Istituto di Scienza e Tecnologia per l'Informazione (A. Faedo) - Pisa,, Italy, (3) Dipartimento di Informatica

TL;DR
This paper presents MOSE, a scalable parallel and distributed web search engine architecture designed to efficiently utilize clusters of workstations, enhancing throughput through task and data parallelism.
Contribution
The paper introduces MOSE, a modular, scalable architecture for web search engines optimized for affordable parallel hardware like clusters of workstations.
Findings
MOSE effectively exploits task and data parallelism.
Preliminary experiments show promising scalability.
The architecture can be tuned for various bandwidth requirements.
Abstract
This paper describes the architecture of MOSE (My Own Search Engine), a scalable parallel and distributed engine for searching the web. MOSE was specifically designed to efficiently exploit affordable parallel architectures, such as clusters of workstations. Its modular and scalable architecture can easily be tuned to fulfill the bandwidth requirements of the application at hand. Both task-parallel and data-parallel approaches are exploited within MOSE in order to increase the throughput and efficiently use communication, storing and computational resources. We used a collection of html documents as a benchmark, and conducted preliminary experiments on a cluster of three SMP Linux PCs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Caching and Content Delivery · Peer-to-Peer Network Technologies
