In-Context Probing for Membership Inference in Fine-Tuned Language Models

Zhexi Lu; Hongliang Chi; Nathalie Baracaldo; Swanand Ravindra Kadhe; Yuseok Jeon; Lei Yu

arXiv:2512.16292·cs.CR·December 23, 2025

In-Context Probing for Membership Inference in Fine-Tuned Language Models

Zhexi Lu, Hongliang Chi, Nathalie Baracaldo, Swanand Ravindra Kadhe, Yuseok Jeon, Lei Yu

PDF

Open Access

TL;DR

This paper introduces ICP-MIA, a novel black-box membership inference attack leveraging training dynamics and in-context probing to effectively identify whether data was used in training large language models, enhancing privacy auditing capabilities.

Contribution

The paper presents ICP-MIA, a training-free, theoretically grounded framework that improves membership inference attacks on LLMs by estimating the optimization gap through in-context probing strategies.

Findings

01

ICP-MIA outperforms prior black-box MIAs at low false positive rates.

02

The effectiveness of ICP-MIA depends on reference data alignment and model configurations.

03

The approach provides a practical tool for privacy auditing of deployed LLMs.

Abstract

Membership inference attacks (MIAs) pose a critical privacy threat to fine-tuned large language models (LLMs), especially when models are adapted to domain-specific tasks using sensitive data. While prior black-box MIA techniques rely on confidence scores or token likelihoods, these signals are often entangled with a sample's intrinsic properties - such as content difficulty or rarity - leading to poor generalization and low signal-to-noise ratios. In this paper, we propose ICP-MIA, a novel MIA framework grounded in the theory of training dynamics, particularly the phenomenon of diminishing returns during optimization. We introduce the Optimization Gap as a fundamental signal of membership: at convergence, member samples exhibit minimal remaining loss-reduction potential, while non-members retain significant potential for further optimization. To estimate this gap in a black-box…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Topic Modeling · Adversarial Robustness in Machine Learning