Membership Inference for Contrastive Pre-training Models with Text-only PII Queries
Ruoxi Cheng, Yizhong Ding, Jian Zhao, Hongyi Zhang, Haoxuan Ma, Tianle Zhang, Yiyan Huang, Xuelong Li

TL;DR
This paper introduces UMID, a text-only framework for privacy auditing of contrastive models like CLIP and CLAP, enabling effective detection of memorized PII without exposing biometric data.
Contribution
The paper presents UMID, a novel text-only membership inference method that accurately detects PII memorization in multimodal models, avoiding biometric queries and reducing computational costs.
Findings
UMID achieves high detection accuracy across various models.
The framework operates efficiently with sub-second auditing time.
It effectively bypasses the need for biometric inputs in privacy auditing.
Abstract
Contrastive pretraining models such as CLIP and CLAP, serve as the ubiquitous perceptual backbones for modern multimodal large models, yet their reliance on web-scale data raises growing concerns about memorizing Personally Identifiable Information (PII). Auditing such models via membership inference is challenging in practice: shadow-model MIAs are computationally prohibitive for large multimodal backbones, and existing multimodal auditing methods typically require querying the target with paired biometric inputs, thereby directly exposing sensitive biometric information to the target model. To bypass this critical limitation, we demonstrate a highly desirable capability for privacy auditing: multimodal memorization within these foundational encoders can be accurately inferred using exclusively the text modality. We propose Unimodal Membership Inference Detector (UMID), a text-only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
