URL ordering policies for distributed crawlers: a review

Deepika; Ashutosh Dixit

arXiv:1611.01228·cs.IR·November 7, 2016·2 cites

URL ordering policies for distributed crawlers: a review

Deepika, Ashutosh Dixit

PDF

Open Access

TL;DR

This paper reviews various URL ordering policies for distributed web crawlers, analyzing their efficiency and effectiveness to improve web crawling performance.

Contribution

It provides a comprehensive survey and comparison of existing URL ordering methods for distributed crawlers, highlighting their strengths and limitations.

Findings

01

Different URL ordering policies vary in efficiency

02

Some methods outperform others in freshness and coverage

03

The survey identifies gaps for future research

Abstract

With the increase in size of web, the information is also spreading at large scale. Search Engines are the medium to access this information. Crawler is the module of search engine which is responsible for download the web pages. In order to download the fresh information and get the database rich, crawler should crawl the web in some order. This is called as ordering of URLs. URL ordering should be done in efficient and effective manner in order to crawl the web in proficient manner. In this paper, a survey is done on some existing methods of URL ordering and at the end of this paper comparison is also carried out among them.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Data Mining and Analysis · Distributed and Parallel Computing Systems · Optimization and Search Problems