Analysis of Statistical Hypothesis based Learning Mechanism for Faster Crawling
Sudarshan Nandy, Partha Pratim Sarkar, Achintya Das

TL;DR
This paper introduces a statistical hypothesis-based learning mechanism to adaptively control crawling speed in search engines, improving performance and resource retrieval efficiency amid exponential web growth.
Contribution
It presents a novel learning mechanism that predicts and adjusts crawling speed based on network environment, enhancing web crawling efficiency.
Findings
High-speed performance after scaling technique
Improved relevance in web resource retrieval
Effective control of crawling speed based on environment
Abstract
The growth of world-wide-web (WWW) spreads its wings from an intangible quantities of web-pages to a gigantic hub of web information which gradually increases the complexity of crawling process in a search engine. A search engine handles a lot of queries from various parts of this world, and the answers of it solely depend on the knowledge that it gathers by means of crawling. The information sharing becomes a most common habit of the society, and it is done by means of publishing structured, semi-structured and unstructured resources on the web. This social practice leads to an exponential growth of web-resource, and hence it became essential to crawl for continuous updating of web-knowledge and modification of several existing resources in any situation. In this paper one statistical hypothesis based learning mechanism is incorporated for learning the behavior of crawling speed in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
