TL;DR
This paper critically examines current model inversion attack evaluations, revealing they often overstate privacy risks due to false positives, and proposes a new MLLM-based evaluation framework for more reliable assessment.
Contribution
It identifies flaws in existing evaluation methods, demonstrates the adversarial nature of false positives, and introduces a systematic MLLM-based framework for accurate privacy risk measurement.
Findings
Existing evaluation frameworks overestimate attack success due to false positives.
Many false positives exhibit adversarial transferability, inflating privacy leakage metrics.
The proposed MLLM-based framework provides a more reliable standard for privacy assessment.
Abstract
Model Inversion attacks aim to reconstruct information from private training data by exploiting access to a target model. Nearly all recent MI studies evaluate attack success using a standard framework that computes attack accuracy through a secondary evaluation model trained on the same private data and task design as the target model. In this paper, we present the first in-depth analysis of this dominant evaluation framework and reveal a fundamental issue: many reconstructions deemed successful under the existing framework are in fact false positives that do not capture the visual identity of the target individual. We first show that these MI false positives satisfy the same formal conditions as Type I adversarial examples. Our controlled experiments, we demonstrate extremely high false-positive transferability, an empirical signature characteristic of adversarial behavior, indicating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
