Somesite I Used To Crawl: Awareness, Agency and Efficacy in Protecting Content Creators From AI Crawlers
Enze Liu, Elisa Luo, Shawn Shan, Geoffrey M. Voelker, Ben Y. Zhao,, Stefan Savage

TL;DR
This paper investigates the effectiveness of current online tools and network-level measures in helping content creators, especially artists, protect their work from AI web crawlers, highlighting gaps in awareness and tool efficacy.
Contribution
It provides large-scale measurements and a user study to assess the awareness, agency, and effectiveness of existing crawler-blocking tools for content creators.
Findings
Artists show high demand for crawler-blocking tools.
Technical awareness and deployment agency are limited among artists.
Network-level blockers like reverse proxies offer stronger protection but have limitations.
Abstract
The success of generative AI relies heavily on training on data scraped through extensive crawling of the Internet, a practice that has raised significant copyright, privacy, and ethical concerns. While few measures are designed to resist a resource-rich adversary determined to scrape a site, crawlers can be impacted by a range of existing tools such as robots.txt, NoAI meta tags, and active crawler blocking by reverse proxies. In this work, we seek to understand the ability and efficacy of today's networking tools to protect content creators against AI-related crawling. For targeted populations like human artists, do they have the technical knowledge and agency to utilize crawler-blocking tools such as robots.txt, and can such tools be effective? Using large scale measurements and a targeted user study of 203 professional artists, we find strong demand for tools like robots.txt, but…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Software Engineering Research · Security and Verification in Computing
