URL ordering policies for distributed crawlers: a review
Deepika, Ashutosh Dixit

TL;DR
This paper reviews various URL ordering policies for distributed web crawlers, analyzing their efficiency and effectiveness to improve web crawling performance.
Contribution
It provides a comprehensive survey and comparison of existing URL ordering methods for distributed crawlers, highlighting their strengths and limitations.
Findings
Different URL ordering policies vary in efficiency
Some methods outperform others in freshness and coverage
The survey identifies gaps for future research
Abstract
With the increase in size of web, the information is also spreading at large scale. Search Engines are the medium to access this information. Crawler is the module of search engine which is responsible for download the web pages. In order to download the fresh information and get the database rich, crawler should crawl the web in some order. This is called as ordering of URLs. URL ordering should be done in efficient and effective manner in order to crawl the web in proficient manner. In this paper, a survey is done on some existing methods of URL ordering and at the end of this paper comparison is also carried out among them.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Distributed and Parallel Computing Systems · Optimization and Search Problems
