The Tail Tells All: Estimating Model-Level Membership Inference Vulnerability Without Reference Models

Euodia Dodd; Nata\v{s}a Kr\v{c}o; Igor Shilov; Yves-Alexandre de Montjoye

arXiv:2510.19773·cs.LG·October 23, 2025

The Tail Tells All: Estimating Model-Level Membership Inference Vulnerability Without Reference Models

Euodia Dodd, Nata\v{s}a Kr\v{c}o, Igor Shilov, Yves-Alexandre de Montjoye

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a new, reference-model-free method to estimate the privacy vulnerability of AI models against membership inference attacks by analyzing loss distribution characteristics.

Contribution

It proposes a novel approach to assess model vulnerability without expensive reference models, using loss distribution asymmetry and outlier absence as indicators.

Findings

01

Accurately estimates vulnerability to state-of-the-art MIAs

02

Outperforms low-cost reference-based attacks like RMIA

03

Effective for large-language models

Abstract

Membership inference attacks (MIAs) have emerged as the standard tool for evaluating the privacy risks of AI models. However, state-of-the-art attacks require training numerous, often computationally expensive, reference models, limiting their practicality. We present a novel approach for estimating model-level vulnerability, the TPR at low FPR, to membership inference attacks without requiring reference models. Empirical analysis shows loss distributions to be asymmetric and heavy-tailed and suggests that most points at risk from MIAs have moved from the tail (high-loss region) to the head (low-loss region) of the distribution after training. We leverage this insight to propose a method to estimate model-level vulnerability from the training and testing distribution alone: using the absence of outliers from the high-loss region as a predictor of the risk. We evaluate our method, the…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

The paper makes solid contributions. It substantially reduces the computational cost of estimating membership inference vulnerability by removing the need for reference models. It also establishes a strong empirical relationship between true MIA performance and its proposed proxy (the LOSS TNR metric), demonstrating that simple loss-based statistics can reliably estimate privacy risk. Finally, it conducts extensive experiments across diverse architectures and datasets, reinforcing the robustness

Weaknesses

My biggest concern is with the limited scale of the image experiments. The tails tend to disappear when the generalization gap is low. For example, finetuning a large transformer models (like ViT) on CIFAR datasets. I think this represents an important case for the authors to consider.

Reviewer 02Rating 8Confidence 4

Strengths

- The proposed method is way more efficient than previous approaches. - The method is a very good and efficient indicator to approximate vulnerability to MIAs after training a model. - The paper was very easy to read and to follow. - With an adaptation of the LOSS TNR to the LOSS AUC, the method can even be applied to LLMs.

Weaknesses

- While LiRA and RMIA are computationally more demanding, these attacks can be used to predict membership for individual samples. The proposed method cannot predict membership for individual samples, but only estimates the vulnerability to membership inference attacks on a model level. Misc: - In line 82, the sentence seems to be incomplete and has "achieve" two times within the sentence. - In line 260, "Appendix" and the closing brackets are missing.

Reviewer 03Rating 2Confidence 4

Strengths

- The empirical results cover multiple datasets and architectures, demonstrating correlation between the proposed metric (LOSS TNR) and LiRA’s TPR at FPR. - The paper is clearly written and easy to follow.

Weaknesses

- The paper focuses on model-level vulnerability estimation, but it is unclear what the real-world application scenario of such a metric is. In privacy evaluation, MIAs are primarily defined as worst-case, sample-level privacy breaches (determining whether a particular record was in training), as evidenced by Carnili et al.. A model-level average metric offers little actionable guidance: practitioners either need to test specific data records (for auditing) or evaluate defense mechanisms under r

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Information and Cyber Security · Ethics and Social Impacts of AI