Analysis of Statistical Hypothesis based Learning Mechanism for Faster   Crawling

Sudarshan Nandy; Partha Pratim Sarkar; Achintya Das

arXiv:1208.2261·cs.IR·August 14, 2012

Analysis of Statistical Hypothesis based Learning Mechanism for Faster Crawling

Sudarshan Nandy, Partha Pratim Sarkar, Achintya Das

PDF

TL;DR

This paper introduces a statistical hypothesis-based learning mechanism to adaptively control crawling speed in search engines, improving performance and resource retrieval efficiency amid exponential web growth.

Contribution

It presents a novel learning mechanism that predicts and adjusts crawling speed based on network environment, enhancing web crawling efficiency.

Findings

01

High-speed performance after scaling technique

02

Improved relevance in web resource retrieval

03

Effective control of crawling speed based on environment

Abstract

The growth of world-wide-web (WWW) spreads its wings from an intangible quantities of web-pages to a gigantic hub of web information which gradually increases the complexity of crawling process in a search engine. A search engine handles a lot of queries from various parts of this world, and the answers of it solely depend on the knowledge that it gathers by means of crawling. The information sharing becomes a most common habit of the society, and it is done by means of publishing structured, semi-structured and unstructured resources on the web. This social practice leads to an exponential growth of web-resource, and hence it became essential to crawl for continuous updating of web-knowledge and modification of several existing resources in any situation. In this paper one statistical hypothesis based learning mechanism is incorporated for learning the behavior of crawling speed in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.