Web Intellectual Property at Risk: Preventing Unauthorized Real-Time Retrieval by Large Language Models

Yisheng Zhong; Yizhu Wen; Junfeng Guo; Mehran Kafai; Heng Huang; Hanqing Guo; Zhuangdi Zhu

arXiv:2505.12655·cs.CR·June 9, 2025

Web Intellectual Property at Risk: Preventing Unauthorized Real-Time Retrieval by Large Language Models

Yisheng Zhong, Yizhu Wen, Junfeng Guo, Mehran Kafai, Heng Huang, Hanqing Guo, Zhuangdi Zhu

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel framework that helps web content creators protect their intellectual property from unauthorized real-time data extraction by large language models, addressing a critical emerging threat.

Contribution

It presents a new defense mechanism leveraging LLMs' semantic understanding to prevent unauthorized IP retrieval, outperforming traditional methods.

Findings

01

Defense success rate improved from 2.5% to 88.6%.

02

Outperforms traditional configuration-based restrictions.

03

Effective against multiple LLMs.

Abstract

The protection of cyber Intellectual Property (IP) such as web content is an increasingly critical concern. The rise of large language models (LLMs) with online retrieval capabilities enables convenient access to information but often undermines the rights of original content creators. As users increasingly rely on LLM-generated responses, they gradually diminish direct engagement with original information sources, which will significantly reduce the incentives for IP creators to contribute, and lead to a saturating cyberspace with more AI-generated content. In response, we propose a novel defense framework that empowers web content creators to safeguard their web-based IP from unauthorized LLM real-time extraction and redistribution by leveraging the semantic understanding capability of LLMs themselves. Our method follows principled motivations and effectively addresses an intractable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Web Intellectual Property at Risk: Preventing Unauthorized Real-Time Retrieval by Large Language Models· underline

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Advanced Graph Neural Networks · Spam and Phishing Detection