Anywhere: A Web Crawler Automation Management Interface
Jinwei Lin

TL;DR
Anywhere is a management interface that enhances the usability and efficiency of Scrapy web crawling projects by providing quick project generation, repeatable configurations, and multi-project management, significantly improving development speed.
Contribution
The paper introduces Anywhere, a framework that simplifies and accelerates Scrapy web crawler management through automation and multi-project handling features.
Findings
Development efficiency improved by about 200%.
Multi-project management efficiency increased by about 300%.
Simplifies quick spider project generation.
Abstract
Web crawling projects or design is significant in the current information age. Using the web spider or crawler can automatically search and collect a huge amount of internet information. As one of the most popular web crawler frameworks, Scrapy is robust in abundant functions but weak in easy operation. In this paper, we provide a framework Anywhere, for optimising the usage feeling and improving the use efficiency of the web crawling management of Scrapy. We analyse the whole workflow of a web crawling project of Scrapy and design two main functions in Anywhere, one is quickly generating a Scrapy project with the preset temperatures, the other is repeatable configuration function for the Scrapy project setting. Beside, with Anywhere, users can easily directly manage multiple Scrapy projects with a file folders architecture. Compared with normal Scrapy project interactive coding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Distributed and Parallel Computing Systems · Service-Oriented Architecture and Web Services
