Exploring the limits of strong membership inference attacks on large language models

Jamie Hayes; Ilia Shumailov; Christopher A. Choquette-Choo; Matthew Jagielski; George Kaissis; Milad Nasr; Sahra Ghalebikesabi; Meenatchi Sundaram Mutu Selva Annamalai; Niloofar Mireshghallah; Igor Shilov; Matthieu Meeus; Yves-Alexandre de Montjoye; Katherine Lee; Franziska Boenisch; Adam Dziedzic; A. Feder Cooper

arXiv:2505.18773·cs.CR·January 9, 2026

Exploring the limits of strong membership inference attacks on large language models

Jamie Hayes, Ilia Shumailov, Christopher A. Choquette-Choo, Matthew Jagielski, George Kaissis, Milad Nasr, Sahra Ghalebikesabi, Meenatchi Sundaram Mutu Selva Annamalai, Niloofar Mireshghallah, Igor Shilov, Matthieu Meeus, Yves-Alexandre de Montjoye, Katherine Lee

PDF

Open Access

TL;DR

This paper investigates the effectiveness of strong membership inference attacks on large language models, revealing their limited success and instability, and challenging prior assumptions about LLM privacy vulnerabilities.

Contribution

It scales a strong MIA to large LLMs, demonstrating limited practical effectiveness and revealing instability and complex relationships with privacy metrics.

Findings

01

Strong MIAs can succeed on large LLMs but with limited effectiveness.

02

Many MIA decisions are unstable and indistinguishable from random chance.

03

The relationship between MIA success and privacy metrics is complex.

Abstract

State-of-the-art membership inference attacks (MIAs) typically require training many reference models, making it difficult to scale these attacks to large pre-trained language models (LLMs). As a result, prior research has either relied on weaker attacks that avoid training references (e.g., fine-tuning attacks), or on stronger attacks applied to small models and datasets. However, weaker attacks have been shown to be brittle and insights from strong attacks in simplified settings do not translate to today's LLMs. These challenges prompt an important question: are the limitations observed in prior work due to attack design choices, or are MIAs fundamentally ineffective on LLMs? We address this question by scaling LiRA--one of the strongest MIAs--to GPT-2 architectures ranging from 10M to 1B parameters, training references on over 20B tokens from the C4 dataset. Our results advance the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Linear Layer · Dense Connections · Linear Warmup With Cosine Annealing · Attention Dropout · Softmax · Weight Decay · Multi-Head Attention