Towards Label-Only Membership Inference Attack against Pre-trained Large   Language Models

Yu He; Boheng Li; Liu Liu; Zhongjie Ba; Wei Dong; Yiming Li; Zhan Qin,; Kui Ren; Chun Chen

arXiv:2502.18943·cs.CR·February 27, 2025

Towards Label-Only Membership Inference Attack against Pre-trained Large Language Models

Yu He, Boheng Li, Liu Liu, Zhongjie Ba, Wei Dong, Yiming Li, Zhan Qin,, Kui Ren, Chun Chen

PDF

Open Access

TL;DR

This paper introduces PETAL, a novel label-only membership inference attack against pre-trained large language models, leveraging token-level semantic similarity to improve attack effectiveness without access to full output logits.

Contribution

The paper proposes PETAL, a new label-only MIA method that uses token semantic similarity to better infer membership in pre-trained LLMs, outperforming existing label-only attacks.

Findings

01

PETAL outperforms existing label-only MIAs on benchmarks.

02

PETAL achieves comparable results to logit-based attacks.

03

Pre-trained LLMs show minimal robustness differences between members and non-members.

Abstract

Membership Inference Attacks (MIAs) aim to predict whether a data sample belongs to the model's training set or not. Although prior research has extensively explored MIAs in Large Language Models (LLMs), they typically require accessing to complete output logits (\ie, \textit{logits-based attacks}), which are usually not available in practice. In this paper, we study the vulnerability of pre-trained LLMs to MIAs in the \textit{label-only setting}, where the adversary can only access generated tokens (text). We first reveal that existing label-only MIAs have minor effects in attacking pre-trained LLMs, although they are highly effective in inferring fine-tuning datasets used for personalized LLMs. We find that their failure stems from two main reasons, including better generalization and overly coarse perturbation. Specifically, due to the extensive pre-training corpora and exposing each…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Artificial Intelligence in Healthcare and Education

MethodsSparse Evolutionary Training