TL;DR
This paper introduces EZ-MIA, a training-free membership inference attack that significantly improves privacy risk detection in language models by exploiting error positions, with minimal computational cost.
Contribution
EZ-MIA is a novel, training-free attack method that leverages error position analysis to enhance membership inference detection accuracy in language models.
Findings
EZ-MIA achieves 3.8x higher detection than prior state-of-the-art on GPT-2.
At 0.1% FPR, EZ-MIA detects 8x more training data memberships.
The method extends effectively to larger models like Llama-2-7B.
Abstract
Fine-tuned language models pose significant privacy risks, as they may memorize and expose sensitive information from their training data. Membership inference attacks (MIAs) provide a principled framework for auditing these risks, yet existing methods achieve limited detection rates, particularly at the low false-positive thresholds required for practical privacy auditing. We present EZ-MIA, a membership inference attack that exploits a key observation: memorization manifests most strongly at error positions, specifically tokens where the model predicts incorrectly yet still shows elevated probability for training examples. We introduce the Error Zone (EZ) score, which measures the directional imbalance of probability shifts at error positions relative to a pretrained reference model. This principled statistic requires only two forward passes per query and no model training of any…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
