Beyond Memorization: Violating Privacy Via Inference with Large Language Models
Robin Staab, Mark Vero, Mislav Balunovi\'c, Martin Vechev

TL;DR
This paper demonstrates that large language models can infer personal attributes from user text with high accuracy, posing significant privacy risks that current mitigation strategies fail to address effectively.
Contribution
It provides the first comprehensive analysis of LLMs' ability to infer personal data, highlighting privacy vulnerabilities beyond memorization and evaluating ineffective existing defenses.
Findings
LLMs can infer personal attributes with up to 85% top-1 accuracy.
Current privacy mitigations like anonymization are ineffective.
LLMs can infer personal data faster and more cheaply than humans.
Abstract
Current privacy research on large language models (LLMs) primarily focuses on the issue of extracting memorized training data. At the same time, models' inference capabilities have increased drastically. This raises the key question of whether current LLMs could violate individuals' privacy by inferring personal attributes from text given at inference time. In this work, we present the first comprehensive study on the capabilities of pretrained LLMs to infer personal attributes from text. We construct a dataset consisting of real Reddit profiles, and show that current LLMs can infer a wide range of personal attributes (e.g., location, income, sex), achieving up to top-1 and top-3 accuracy at a fraction of the cost () and time () required by humans. As people increasingly interact with LLM-powered chatbots across all aspects of life, we also explore…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Privacy, Security, and Data Protection
