Membership Inference Attacks for Unseen Classes
Pratiksha Thaker, Neil Kale, Zhiwei Steven Wu, Virginia Smith

TL;DR
This paper reveals limitations of shadow model-based membership inference attacks, especially on unseen classes, and proposes quantile regression attacks as a more effective alternative, with significant empirical and theoretical support.
Contribution
It identifies a fundamental flaw in shadow model attacks for unseen classes and introduces quantile regression attacks that outperform existing methods both empirically and theoretically.
Findings
Shadow model attacks can fail catastrophically on restricted data
Quantile regression attacks achieve up to 11x higher TPR
Theoretical model explains generalization of the proposed approach
Abstract
The state-of-the-art for membership inference attacks on machine learning models is a class of attacks based on shadow models that mimic the behavior of the target model on subsets of held-out nonmember data. However, we find that this class of attacks is fundamentally limited because of a key assumption -- that the shadow models can replicate the target model's behavior on the distribution of interest. As a result, we show that attacks relying on shadow models can fail catastrophically on critical AI safety applications where data access is restricted due to legal, ethical, or logistical constraints, so that the shadow models have no reasonable signal on the query examples. Although this problem seems intractable within the shadow model paradigm, we find that quantile regression attacks are a promising approach in this setting, as these models learn features of member examples that can…
Peer Reviews
Decision·Submitted to ICLR 2026
- Important Problem: Identifies a critical yet unstudied scenario in practical AI safety auditing with significant real-world implications. - Systematic Evaluation: Comprehensive experiments across image, text, and tabular datasets demonstrate quantile regression's consistent superiority. - Theoretical Support: Provides theoretical explanation for why quantile regression generalizes to unseen classes.
- Insufficient Failure Analysis: Improvements are minimal on some datasets (e.g., CINIC-10). The paper lacks analysis of when quantile regression fails or how much data diversity ensures generalization. - Unrealistic Assumptions: Assumes complete knowledge of target model architecture and training process. No systematic evaluation of robustness to architecture mismatch or training differences. RMIA receives unfair advantage (using unseen class samples at evaluation) without adequate discussion
The paper introduces a new threat model that's compelling. Unseen classes are a new interesting attack angle. The paper introduces a 10x better method that does far better than prior techniques. The ROC curves are useful to see, the results are convincing, the evaluation is well performed. There are no significant errors in anything.
This paper doesn't introduce anything really that new. The core method of quantile regression isn't anything that new. The overall scheme is basically applied exactly in the normal way. There's one small difference: instead of using the confidence on the true label (which requires knowing the ground truth label that may be from an unseen class), the paper uses the difference between the top two logits. But with that exception, the techniques are all the same. This isn't a terribly bad limitation
- MIA for unseen classes is a practical yet under explored threat model, e.g., in the context of CSAM detection. The proposed MIA method show promises of better generalization to unseen classes compared to shadow-model-based approaches, and is supported by analysis of linear quantile regression predictor under certain assumptions. - Experiments cover a good range of Tabular, image and textual learning dataset.
- It is mainly a comparison and analysis paper, where all evaluated MIA methods are existent. The authors did not propose any new MIAs that could potentially boost MIA on unseen classes, but rather just applied quantile regression MIA and argued it is better than shadow-model-based method. - Lack of comparison to shadow-model-free MIA baselines: this includes per-class population attack [Nasr et al. 2019, Ye et al. 2022] which is applicable when adversary has access to even a small pool of targe
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data · Explainable Artificial Intelligence (XAI)
MethodsDropout
