Can We Infer Confidential Properties of Training Data from LLMs?
Pengrun Huang, Chhavi Yadav, Kamalika Chaudhuri, Ruihan Wu

TL;DR
This paper introduces PropInfer, a benchmark for property inference attacks on large language models, demonstrating their vulnerability to revealing sensitive training data properties through novel attack methods.
Contribution
The paper presents a new benchmark and attack techniques for property inference on LLMs, highlighting a previously unrecognized privacy vulnerability.
Findings
Attacks successfully infer confidential data properties from LLMs.
Prompt-based and shadow-model attacks outperform baseline methods.
Vulnerabilities exist across multiple pretrained LLMs.
Abstract
Large language models (LLMs) are increasingly fine-tuned on domain-specific datasets to support applications in fields such as healthcare, finance, and law. These fine-tuning datasets often have sensitive and confidential dataset-level properties -- such as patient demographics or disease prevalence -- that are not intended to be revealed. While prior work has studied property inference attacks on discriminative models (e.g., image classification models) and generative models (e.g., GANs for image data), it remains unclear if such attacks transfer to LLMs. In this work, we introduce PropInfer, a benchmark task for evaluating property inference in LLMs under two fine-tuning paradigms: question-answering and chat-completion. Built on the ChatDoctor dataset, our benchmark includes a range of property types and task configurations. We further propose two tailored attacks: a prompt-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Machine Learning in Healthcare · Topic Modeling
