Membership Inference Attacks Against In-Context Learning
Rui Wen, Zheng Li, Michael Backes, Yang Zhang

TL;DR
This paper introduces the first membership inference attack against In-Context Learning in large language models, demonstrating high accuracy and proposing defenses to mitigate privacy risks.
Contribution
It presents novel attack strategies for ICL, evaluates their effectiveness across models, and explores defense mechanisms to enhance privacy protection.
Findings
Attacks achieve up to 95% accuracy in membership inference.
Hybrid attack outperforms individual strategies in most cases.
Combining multiple defenses significantly reduces privacy leakage.
Abstract
Adapting Large Language Models (LLMs) to specific tasks introduces concerns about computational efficiency, prompting an exploration of efficient methods such as In-Context Learning (ICL). However, the vulnerability of ICL to privacy attacks under realistic assumptions remains largely unexplored. In this work, we present the first membership inference attack tailored for ICL, relying solely on generated texts without their associated probabilities. We propose four attack strategies tailored to various constrained scenarios and conduct extensive experiments on four popular large language models. Empirical results show that our attacks can accurately determine membership status in most cases, e.g., 95\% accuracy advantage against LLaMA, indicating that the associated risks are much higher than those shown by existing probability-based attacks. Additionally, we propose a hybrid attack that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Machine Learning in Healthcare
MethodsLLaMA
