User Inference Attacks on Large Language Models

Nikhil Kandpal; Krishna Pillutla; Alina Oprea; Peter Kairouz,; Christopher A. Choquette-Choo; Zheng Xu

arXiv:2310.09266·cs.CR·February 27, 2024·2 cites

User Inference Attacks on Large Language Models

Nikhil Kandpal, Krishna Pillutla, Alina Oprea, Peter Kairouz,, Christopher A. Choquette-Choo, Zheng Xu

PDF

Open Access 1 Video 3 Reviews

TL;DR

This paper investigates privacy risks in fine-tuned large language models, demonstrating vulnerability to user inference attacks and proposing partial mitigation strategies to enhance user data privacy.

Contribution

It introduces a black-box user inference attack on fine-tuned LLMs and analyzes factors influencing user vulnerability, along with evaluating mitigation techniques.

Findings

01

LLMs are highly susceptible to user inference attacks.

02

Outlier users and those with shared features are most vulnerable.

03

Mitigation methods like differential privacy offer partial protection.

Abstract

Fine-tuning is a common and effective method for tailoring large language models (LLMs) to specialized tasks and applications. In this paper, we study the privacy implications of fine-tuning LLMs on user data. To this end, we consider a realistic threat model, called user inference, wherein an attacker infers whether or not a user's data was used for fine-tuning. We design attacks for performing user inference that require only black-box access to the fine-tuned LLM and a few samples from a user which need not be from the fine-tuning dataset. We find that LLMs are susceptible to user inference across a variety of fine-tuning datasets, at times with near perfect attack success rates. Further, we theoretically and empirically investigate the properties that make users vulnerable to user inference, finding that outlier users, users with identifiable shared features between examples, and…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 3· reject, not good enoughConfidence 4

Strengths

- 1) Understanding the extent to which fine-tuned LLMs leak the membership of users in a relaxed threat model (i.e., was any data of a user used to train the model) is an interesting question. - 2) Extensive evaluation of the attack’s performance under different settings and mitigation techniques.

Weaknesses

- 1) Lack of novelty: the black-box methodology used is standard and https://arxiv.org/pdf/2304.02782.pdf already proposes the same threat model. Framing existing attack terminology (“user inference”) as something new is confusing as there are already multiple works proposing the stronger threat model where either X texts of a user’s data or none were used to train the model (e.g., Song and Shmatikov). To the best of my understanding, it seems that user-level + fine-tuning + LLM is new, but the

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 2

Strengths

The paper presents a clear problem statement and convincingly demonstrates how user inference, grounded in a more realistic assumption regarding access to user data, fills gaps in existing privacy attacks. The core idea is clearly illustrated through figures and well elaborated with theoretical and technical details in the paper. Additionally, the paper provides thorough experiments and analysis of the attack performance across different datasets and explores factors that may influence the per

Weaknesses

The basic assumption of user inference -- "samples from the same user are more similar on average than those from different users" -- could limit the applicability of the method. When merging samples from diverse tasks or domains during finetuning (_e.g._, blogs of different topics or on different media platforms), samples from the same domains might exhibit greater similarity than those from the same user. Also, since the pre-trained model is used as the reference model to calculate the test st

Reviewer 03Rating 8· accept, good paperConfidence 3

Strengths

- The paper introduced a new realistic threat model called user inference, and proposed a practical attack using likelihood ratio test. Experimental results show the effectiveness of the proposed attack method. - The paper is well-written and enjoyable to read. Sufficient insights and informative figures are provided to help make the point.

Weaknesses

There is no obvious weakness in the draft.

Videos

User Inference Attacks on Large Language Models· underline

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare · Explainable Artificial Intelligence (XAI)

MethodsGradient Clipping · Early Stopping