Using sensors in the web crawling process
Ilya Zemskov

TL;DR
This paper proposes a sensor-based system for web crawling that detects resource changes on web servers to optimize reindexing, supported by simulation results and implementation efforts.
Contribution
It introduces a novel sensor module for web servers to improve web crawling efficiency by detecting resource changes.
Findings
Sensor system effectively detects resource changes
Simulation results show improved reindexing efficiency
Implementation demonstrates feasibility of sensor-based web crawling
Abstract
This paper offers a short description of an Internet information field monitoring system, which places a special module-sensor on the side of the Web-server to detect changes in information resources and subsequently reindexes only the resources signalized by the corresponding sensor. Concise results of simulation research and an implementation attempt of the given "sensors" concept are provided.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Advanced Data Processing Techniques
