A Unified Perspective on Adversarial Membership Manipulation in Vision Models
Ruize Gao, Kaiwen Zhou, Yongqiang Chen, Feng Liu

TL;DR
This paper uncovers a new adversarial vulnerability in membership inference attacks on vision models, demonstrating how imperceptible perturbations can falsely classify non-members as members and proposing detection methods to counteract this.
Contribution
It provides the first unified analysis of adversarial membership manipulation, revealing its mechanism, geometric signature, and proposing a detection strategy to improve robustness.
Findings
Adversarial membership fabrication is effective across architectures and datasets.
A geometric signature distinguishes fabricated from true members.
Detection strategies significantly reduce manipulation success.
Abstract
Membership inference attacks (MIAs) aim to determine whether a specific data point was part of a model's training set, serving as effective tools for evaluating privacy leakage of vision models. However, existing MIAs implicitly assume honest query inputs, and their adversarial robustness remains unexplored. We show that MIAs for vision models expose a previously overlooked adversarial surface: adversarial membership manipulation, where imperceptible perturbations can reliably push non-member images into the "member" region of state-of-the-art MIAs. In this paper, we provide the first unified perspective on this phenomenon by analyzing its mechanism and implications. We begin by demonstrating that adversarial membership fabrication is consistently effective across diverse architectures and datasets. We then reveal a distinctive geometric signature - a characteristic gradient-norm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
