Membership Inference on LLMs in the Wild
Jiatong Yi, Yanyang Li

TL;DR
This paper introduces SimMIA, a new framework for membership inference attacks on large language models that operates effectively in black-box settings using only generated text, and presents a benchmark for evaluation.
Contribution
The paper proposes SimMIA, a novel black-box MIA method for LLMs, and introduces WikiMIA-25, a benchmark dataset for evaluating MIA performance on proprietary models.
Findings
SimMIA achieves state-of-the-art results in black-box MIA scenarios.
SimMIA rivals methods that use internal model information.
The benchmark WikiMIA-25 enables standardized evaluation of MIA techniques.
Abstract
Membership Inference Attacks (MIAs) act as a crucial auditing tool for the opaque training data of Large Language Models (LLMs). However, existing techniques predominantly rely on inaccessible model internals (e.g., logits) or suffer from poor generalization across domains in strict black-box settings where only generated text is available. In this work, we propose SimMIA, a robust MIA framework tailored for this text-only regime by leveraging an advanced sampling strategy and scoring mechanism. Furthermore, we present WikiMIA-25, a new benchmark curated to evaluate MIA performance on modern proprietary LLMs. Experiments demonstrate that SimMIA achieves state-of-the-art results in the black-box setting, rivaling baselines that exploit internal model information.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Natural Language Processing Techniques
