A Brief History of Web Crawlers
Seyed M. Mirtaheri, Mustafa Emre Din\c{c}kt\"urk, Salman Hooshmand,, Gregor V. Bochmann, Guy-Vincent Jourdan, Iosif Viorel Onut

TL;DR
This paper provides a historical overview of web crawlers, highlighting their evolution, challenges, and solutions, while introducing evaluation criteria and comparing performance across different crawler techniques.
Contribution
It offers a comprehensive history of web crawlers, proposes criteria for performance evaluation, and compares various crawler algorithms over time.
Findings
Web crawlers have evolved from simple data collectors to complex tools for indexing, security, and accessibility.
Performance of web crawlers has improved significantly over time based on proposed evaluation criteria.
Challenges such as exhaustive crawling and modeling modern web applications remain open research areas.
Abstract
Web crawlers visit internet applications, collect data, and learn about new web pages from visited pages. Web crawlers have a long and interesting history. Early web crawlers collected statistics about the web. In addition to collecting statistics about the web and indexing the applications for search engines, modern crawlers can be used to perform accessibility and vulnerability checks on the application. Quick expansion of the web, and the complexity added to web applications have made the process of crawling a very challenging one. Throughout the history of web crawling many researchers and industrial groups addressed different issues and challenges that web crawlers face. Different solutions have been proposed to reduce the time and cost of crawling. Performing an exhaustive crawl is a challenging question. Additionally capturing the model of a modern web application and extracting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Software Testing and Debugging Techniques · Software Engineering Research
