Towards More Realistic Extraction Attacks: An Adversarial Perspective
Yash More, Prakhar Ganesh, Golnoosh Farnadi

TL;DR
This paper investigates realistic extraction attacks on language models, revealing that multi-faceted, repeated, and varied prompts significantly increase data extraction risks, even against mitigations, with implications for privacy and copyright.
Contribution
It introduces a comprehensive adversarial perspective considering multiple access points, demonstrating increased extraction risks and outperforming prior methods in realistic scenarios.
Findings
Extraction risks double with combined attacks.
Small prompt changes can significantly alter extraction outcomes.
Mitigation strategies like data deduplication are less effective against advanced attacks.
Abstract
Language models are prone to memorizing their training data, making them vulnerable to extraction attacks. While existing research often examines isolated setups, such as a single model or a fixed prompt, real-world adversaries have a considerably larger attack surface due to access to models across various sizes and checkpoints, and repeated prompting. In this paper, we revisit extraction attacks from an adversarial perspective -- with multi-faceted access to the underlying data. We find significant churn in extraction trends, i.e., even unintuitive changes to the prompt, or targeting smaller models and earlier checkpoints, can extract distinct information. By combining multiple attacks, our adversary doubles () the extraction risks, persisting even under mitigation strategies like data deduplication. We conclude with four case studies, including detecting pre-training data,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
MethodsSparse Evolutionary Training
