Towards More Realistic Extraction Attacks: An Adversarial Perspective

Yash More; Prakhar Ganesh; Golnoosh Farnadi

arXiv:2407.02596·cs.CR·August 11, 2025

Towards More Realistic Extraction Attacks: An Adversarial Perspective

Yash More, Prakhar Ganesh, Golnoosh Farnadi

PDF

Open Access 1 Repo

TL;DR

This paper investigates realistic extraction attacks on language models, revealing that multi-faceted, repeated, and varied prompts significantly increase data extraction risks, even against mitigations, with implications for privacy and copyright.

Contribution

It introduces a comprehensive adversarial perspective considering multiple access points, demonstrating increased extraction risks and outperforming prior methods in realistic scenarios.

Findings

01

Extraction risks double with combined attacks.

02

Small prompt changes can significantly alter extraction outcomes.

03

Mitigation strategies like data deduplication are less effective against advanced attacks.

Abstract

Language models are prone to memorizing their training data, making them vulnerable to extraction attacks. While existing research often examines isolated setups, such as a single model or a fixed prompt, real-world adversaries have a considerably larger attack surface due to access to models across various sizes and checkpoints, and repeated prompting. In this paper, we revisit extraction attacks from an adversarial perspective -- with multi-faceted access to the underlying data. We find significant churn in extraction trends, i.e., even unintuitive changes to the prompt, or targeting smaller models and earlier checkpoints, can extract distinct information. By combining multiple attacks, our adversary doubles ( $2 \times$ ) the extraction risks, persisting even under mitigation strategies like data deduplication. We conclude with four case studies, including detecting pre-training data,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

equal-mila/llm_extraction_eval
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning

MethodsSparse Evolutionary Training