TUNI: A Textual Unimodal Detector for Identity Inference in CLIP Models

Songze Li; Ruoxi Cheng; Xiaojun Jia

arXiv:2405.14517·cs.LG·March 26, 2025

TUNI: A Textual Unimodal Detector for Identity Inference in CLIP Models

Songze Li, Ruoxi Cheng, Xiaojun Jia

PDF

Open Access

TL;DR

This paper introduces TUNI, a novel textual unimodal detector for identity inference in CLIP models that operates solely on text data, avoiding image queries and shadow model training, with superior performance demonstrated across multiple datasets.

Contribution

TUNI is the first method to perform identity inference in CLIP models using only text data, eliminating the need for image queries and shadow models, thus reducing privacy risks and computational costs.

Findings

01

TUNI outperforms baseline methods in identity inference accuracy.

02

TUNI effectively operates without image data, reducing privacy exposure.

03

The method is robust across various CLIP architectures and datasets.

Abstract

The widespread usage of large-scale multimodal models like CLIP has heightened concerns about the leakage of PII. Existing methods for identity inference in CLIP models require querying the model with full PII, including textual descriptions of the person and corresponding images (e.g., the name and the face photo of the person). However, applying images may risk exposing personal information to target models, as the image might not have been previously encountered by the target model. Additionally, previous MIAs train shadow models to mimic the behaviors of the target model, which incurs high computational costs, especially for large CLIP models. To address these challenges, we propose a textual unimodal detector (TUNI) in CLIP models, a novel technique for identity inference that: 1) only utilizes text data to query the target model; and 2) eliminates the need for training shadow…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsSparse Evolutionary Training · Contrastive Language-Image Pre-training