User Inference Attacks on Large Language Models
Nikhil Kandpal, Krishna Pillutla, Alina Oprea, Peter Kairouz,, Christopher A. Choquette-Choo, Zheng Xu

TL;DR
This paper investigates privacy risks in fine-tuned large language models, demonstrating vulnerability to user inference attacks and proposing partial mitigation strategies to enhance user data privacy.
Contribution
It introduces a black-box user inference attack on fine-tuned LLMs and analyzes factors influencing user vulnerability, along with evaluating mitigation techniques.
Findings
LLMs are highly susceptible to user inference attacks.
Outlier users and those with shared features are most vulnerable.
Mitigation methods like differential privacy offer partial protection.
Abstract
Fine-tuning is a common and effective method for tailoring large language models (LLMs) to specialized tasks and applications. In this paper, we study the privacy implications of fine-tuning LLMs on user data. To this end, we consider a realistic threat model, called user inference, wherein an attacker infers whether or not a user's data was used for fine-tuning. We design attacks for performing user inference that require only black-box access to the fine-tuned LLM and a few samples from a user which need not be from the fine-tuning dataset. We find that LLMs are susceptible to user inference across a variety of fine-tuning datasets, at times with near perfect attack success rates. Further, we theoretically and empirically investigate the properties that make users vulnerable to user inference, finding that outlier users, users with identifiable shared features between examples, and…
Peer Reviews
Decision·Submitted to ICLR 2024
- 1) Understanding the extent to which fine-tuned LLMs leak the membership of users in a relaxed threat model (i.e., was any data of a user used to train the model) is an interesting question. - 2) Extensive evaluation of the attack’s performance under different settings and mitigation techniques.
- 1) Lack of novelty: the black-box methodology used is standard and https://arxiv.org/pdf/2304.02782.pdf already proposes the same threat model. Framing existing attack terminology (“user inference”) as something new is confusing as there are already multiple works proposing the stronger threat model where either X texts of a user’s data or none were used to train the model (e.g., Song and Shmatikov). To the best of my understanding, it seems that user-level + fine-tuning + LLM is new, but the
The paper presents a clear problem statement and convincingly demonstrates how user inference, grounded in a more realistic assumption regarding access to user data, fills gaps in existing privacy attacks. The core idea is clearly illustrated through figures and well elaborated with theoretical and technical details in the paper. Additionally, the paper provides thorough experiments and analysis of the attack performance across different datasets and explores factors that may influence the per
The basic assumption of user inference -- "samples from the same user are more similar on average than those from different users" -- could limit the applicability of the method. When merging samples from diverse tasks or domains during finetuning (_e.g._, blogs of different topics or on different media platforms), samples from the same domains might exhibit greater similarity than those from the same user. Also, since the pre-trained model is used as the reference model to calculate the test st
- The paper introduced a new realistic threat model called user inference, and proposed a practical attack using likelihood ratio test. Experimental results show the effectiveness of the proposed attack method. - The paper is well-written and enjoyable to read. Sufficient insights and informative figures are provided to help make the point.
There is no obvious weakness in the draft.
Videos
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Explainable Artificial Intelligence (XAI)
MethodsGradient Clipping · Early Stopping
