
TL;DR
This paper introduces methods for building and deploying web spiders and scrapers to efficiently and safely retrieve large-scale information from complex networks on the internet.
Contribution
It provides an introductory guide on programming and deploying web spiders and scrapers with a focus on safety and efficiency.
Findings
Effective techniques for safe web crawling
Strategies for high-efficiency data retrieval
Guidelines for deploying web scraping software
Abstract
In recent years, the study of complex networks has received a lot of attention. Real systems have gained importance in scientific publications, despite of an important drawback: the difficulty of retrieving and manage such great quantity of information. This paper wants to be an introduction to the construction of spiders and scrapers: specifically, how to program and deploy safely these kind of software applications. The aim is to show how software can be prepared to automatically surf the net and retrieve information for the user with high efficiency and safety.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWikis in Education and Collaboration · Web Data Mining and Analysis
