Data Defenses Against Large Language Models
William Agnew, Harry H. Jiang, Cella Sum, Maarten Sap, Sauvik Das

TL;DR
This paper introduces 'data defenses', a novel method for data owners to generate adversarial prompts that prevent large language models from accurately inferring sensitive or copyrighted information, thereby empowering data sovereignty.
Contribution
The paper develops a new technique to automatically create adversarial prompt injections that block LLM inference on protected data, addressing ethical and security concerns.
Findings
Data defenses significantly reduce LLM inference accuracy.
The method is effective against commercial and open-source LLMs.
Data defenses are cheap, fast, and resistant to countermeasures.
Abstract
Large language models excel at performing inference over text to extract information, summarize information, or generate additional text. These inference capabilities are implicated in a variety of ethical harms spanning surveillance, labor displacement, and IP/copyright theft. While many policy, legal, and technical mitigations have been proposed to counteract these harms, these mitigations typically require cooperation from institutions that move slower than technical advances (i.e., governments) or that have few incentives to act to counteract these harms (i.e., the corporations that create and profit from these LLMs). In this paper, we define and build "data defenses" -- a novel strategy that directly empowers data owners to block LLMs from performing inference on their data. We create data defenses by developing a method to automatically generate adversarial prompt injections that,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
