Fragments to Facts: Partial-Information Fragment Inference from LLMs
Lucas Rosenblatt, Bin Han, Robert Wolfe, Bill Howe

TL;DR
This paper demonstrates that fine-tuned large language models are vulnerable to fragment-specific extraction attacks even with partial, unordered sample information, highlighting new privacy risks.
Contribution
It introduces a general threat model for partial-information attacks and proposes two novel data-blind methods, including the PRISM approach, to effectively perform these attacks.
Findings
Both methods are competitive with data-aware baselines.
Attacks succeed in medical and legal data scenarios.
Fine-tuned LLMs are vulnerable to fragment-based extraction.
Abstract
Large language models (LLMs) can leak sensitive training data through memorization and membership inference attacks. Prior work has primarily focused on strong adversarial assumptions, including attacker access to entire samples or long, ordered prefixes, leaving open the question of how vulnerable LLMs are when adversaries have only partial, unordered sample information. For example, if an attacker knows a patient has "hypertension," under what conditions can they query a model fine-tuned on patient data to learn the patient also has "osteoarthritis?" In this paper, we introduce a more general threat model under this weaker assumption and show that fine-tuned LLMs are susceptible to these fragment-specific extraction attacks. To systematically investigate these attacks, we propose two data-blind methods: (1) a likelihood ratio attack inspired by methods from membership inference, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Data Quality and Management
