Beyond BeautifulSoup: Benchmarking LLM-Powered Web Scraping for Everyday Users
Arth Bhardwaj, Nirav Diwan, Gang Wang

TL;DR
This paper benchmarks how large language models enable everyday users to perform web scraping tasks on complex sites, demonstrating that end-to-end LLM agents can automate data extraction with minimal prompts and effort.
Contribution
It systematically evaluates LLM-based web scraping workflows across diverse security measures, highlighting their accessibility and practical effectiveness for non-expert users.
Findings
End-to-end LLM agents can automate complex scraping with minimal prompts.
LLM-assisted scripting is effective for static sites and faster in some cases.
Users can achieve successful scraping with less than five prompt refinements.
Abstract
Web scraping has historically required technical expertise in HTML parsing, session management, and authentication circumvention, which limited large-scale data extraction to skilled developers. We argue that large language models (LLMs) have democratized web scraping, enabling low-skill users to execute sophisticated operations through simple natural language prompts. While extensive benchmarks evaluate these tools under optimal expert conditions, we show that without extensive manual effort, current LLM-based workflows allow novice users to scrape complex websites that would otherwise be inaccessible. We systematically benchmark what everyday users can do with off-the-shelf LLM tools across 35 sites spanning five security tiers, including authentication, anti-bot, and CAPTCHA controls. We devise and evaluate two distinct workflows: (a) LLM-assisted scripting, where users prompt LLMs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsWeb Application Security Vulnerabilities · Spam and Phishing Detection · Security and Verification in Computing
