SoK: Membership Inference Attacks on LLMs are Rushing Nowhere (and How   to Fix It)

Matthieu Meeus; Igor Shilov; Shubham Jain; Manuel Faysse; Marek Rei,; Yves-Alexandre de Montjoye

arXiv:2406.17975·cs.CL·March 10, 2025

SoK: Membership Inference Attacks on LLMs are Rushing Nowhere (and How to Fix It)

Matthieu Meeus, Igor Shilov, Shubham Jain, Manuel Faysse, Marek Rei,, Yves-Alexandre de Montjoye

PDF

Open Access 1 Repo 4 Datasets

TL;DR

This paper critically reviews membership inference attacks on large language models, highlighting issues with current evaluation methods due to dataset distribution shifts, and proposes improved evaluation strategies to better understand model memorization.

Contribution

It provides a comprehensive review of MIA research on LLMs, quantifies dataset distribution shifts, and suggests new evaluation methods to improve the reliability of MIA assessments.

Findings

01

Post-hoc datasets suffer from strong distribution shifts.

02

Current MIAs may overestimate memorization due to dataset biases.

03

Proposed evaluation strategies include randomized splits and injections.

Abstract

Whether LLMs memorize their training data and what this means, from measuring privacy leakage to detecting copyright violations, has become a rapidly growing area of research. In the last few months, more than 10 new methods have been proposed to perform Membership Inference Attacks (MIAs) against LLMs. Contrary to traditional MIAs which rely on fixed-but randomized-records or models, these methods are mostly trained and tested on datasets collected post-hoc. Sets of members and non-members, used to evaluate the MIA, are constructed using informed guesses after the release of a model. This lack of randomization raises concerns of a distribution shift between members and non-members. In this work, we first extensively review the literature on MIAs against LLMs and show that, while most work focuses on sequence-level MIAs evaluated in post-hoc setups, a range of target models, motivations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

computationalprivacy/mia_llms_benchmark
pytorchOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Healthcare