Blind Baselines Beat Membership Inference Attacks for Foundation Models
Debeshee Das, Jie Zhang, Florian Tram\`er

TL;DR
This paper demonstrates that simple, distribution-based blind attacks outperform sophisticated membership inference attacks on foundation models, revealing flaws in current evaluation methods and questioning their effectiveness in measuring membership leakage.
Contribution
It shows that blind, distribution-based attacks outperform existing MI attacks and exposes flaws in current evaluation practices for foundation models.
Findings
Blind attacks outperform state-of-the-art MI attacks
Existing evaluations are flawed due to distribution mismatches
Current MI attack assessments do not reliably measure membership leakage
Abstract
Membership inference (MI) attacks try to determine if a data sample was used to train a machine learning model. For foundation models trained on unknown Web data, MI attacks are often used to detect copyrighted training materials, measure test set contamination, or audit machine unlearning. Unfortunately, we find that evaluations of MI attacks for foundation models are flawed, because they sample members and non-members from different distributions. For 8 published MI evaluation datasets, we show that blind attacks -- that distinguish the member and non-member distributions without looking at any trained model -- outperform state-of-the-art MI attacks. Existing evaluations thus tell us nothing about membership leakage of a foundation model's training data.
Peer Reviews
Decision·Submitted to ICLR 2025
- While this topic has been explored by several recent works (both concurrent and prior), this work goes a step beyond to demonstrate the extend of distributional differences between members and non-members, for both LLM and VLM evaluation data for membership inference. - The paper is well written and supports most of its claims with empirical evidence and extensive evaluation.
The submission has significant issues regarding originality and the characterization of related work. The authors' framing of certain works as "concurrent" appears to minimize substantial overlaps, particularly with [1] and [2] which preceded the ICLR deadline by 4 and 8 months respectively. This timeframe makes it difficult to justify as concurrent research. The paper's main conclusion about flawed non-member selection methods introducing detectable distributional shifts largely mirrors the fin
- The paper investigates a significant issue in MIA research, highlighting the importance of unbiased evaluation datasets for accurately benchmarking attack effectiveness on text-based large foundation models. - It provides a systematic evaluation of various datasets and baseline attacks, identifying three common distribution shift patterns that influence the success of MIAs.
- The authors claim that current state-of-the-art MIAs fail to extract meaningful membership information, relying only on biased dataset evaluation results. However, this assertion may be overstated, as blind attacks use dataset-specific prior information (e.g., timestamps), which the proposed state-of-the-art attacks may intentionally avoid as they may aim to propose a general attack. These attacks might still capture useful membership signals, albeit weaker than the dataset-specific prior info
- This paper focuses on the irrationality of MI evaluation datasets is important, especially in an era where foundation models are widely applied. - This paper analyzes 9 published MI evaluation datasets, demonstrating that blind attacks outperform existing MI attacks on these datasets. This reveals the incompleteness of current MI evaluations. - The attack methods proposed in this paper perform exceptionally well, showing significant performance improvements compared to existing MI attacks.
- The comparison experiment setup is unclear. Were the same data conditions used the experiment section? (see Q1) - The core of this paper is to point out the shortages of existing MI attacks on foundation models. However, in the introduction, the discussion does not revolve around this point but rather focuses on how simple attacks can also achieve good results. It is recommended to revise the structure of the introduction to highlight the main contributions of the paper. - The experimental sec
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital and Cyber Forensics · Security and Verification in Computing · Access Control and Trust
MethodsSparse Evolutionary Training
