Semantic Membership Inference Attack against Large Language Models

Hamid Mozaffari; Virendra J. Marathe

arXiv:2406.10218·cs.LG·June 17, 2024·1 cites

Semantic Membership Inference Attack against Large Language Models

Hamid Mozaffari, Virendra J. Marathe

PDF

Open Access 3 Reviews

TL;DR

This paper introduces SMIA, a novel semantic-based membership inference attack that significantly improves the ability to determine if data was in a model's training set, outperforming existing methods on large language models.

Contribution

The paper presents SMIA, a new semantic membership inference attack leveraging input perturbations to enhance attack accuracy on large language models.

Findings

01

SMIA achieves an AUC-ROC of 67.39% on Pythia-12B.

02

SMIA outperforms existing MIAs by a significant margin.

03

Comprehensive evaluations on Pythia and GPT-Neo models demonstrate effectiveness.

Abstract

Membership Inference Attacks (MIAs) determine whether a specific data point was included in the training set of a target model. In this paper, we introduce the Semantic Membership Inference Attack (SMIA), a novel approach that enhances MIA performance by leveraging the semantic content of inputs and their perturbations. SMIA trains a neural network to analyze the target model's behavior on perturbed inputs, effectively capturing variations in output probability distributions between members and non-members. We conduct comprehensive evaluations on the Pythia and GPT-Neo model families using the Wikipedia dataset. Our results show that SMIA significantly outperforms existing MIAs; for instance, SMIA achieves an AUC-ROC of 67.39% on Pythia-12B, compared to 58.90% by the second-best attack.

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 3Confidence 5

Strengths

1. The paper does a good job at explaining their methodology, which considers the fact that learning a classifier on top of LLM behaviors can allow learning membership signals well. 2. The work demonstrates strong gains over past MIAs across multiple benchmark datasets.

Weaknesses

There are various works at this point in literature which have argued that "WC" based assessment of MIAs when data is split across a cut-off date, is not sound. This work bases many of their gains on that setting. [1] Do Membership Inference Attacks Work on Large Language Models? https://arxiv.org/abs/2402.07841. [2] LLM Dataset Inference: Did you train on my dataset? https://arxiv.org/abs/2406.06443. [3] Blind Baselines Beat Membership Inference Attacks for Foundation Models. https://arx

Reviewer 02Rating 5Confidence 4

Strengths

The authors present a well-designed pipeline that combines neighbour generation using masked language models, semantic embedding analysis via the Cohere model, and neural network classification. This comprehensive approach allows SMIA to detect both exact matches and semantically similar content, representing a significant advancement over existing methods. The experimental evaluation is particularly thorough, examining performance across different model sizes, architectures, and datasets. The a

Weaknesses

1. The choice of key hyperparameters, particularly the use of 25 neighbours, lacks thorough clarification. While Table 7 shows performance improvements with increasing neighbour count, there's no clear analysis of the trade-off between computational cost and performance gain. The paper should examine the diminishing returns beyond 25 neighbours and justify why this specific number optimally balances effectiveness and efficiency. 2. A concerning weakness emerges in the method's inconsistent perfo

Reviewer 03Rating 3Confidence 3

Strengths

1. The proposed method in this paper shows a very good novelty. 2. The proposed method has a good performance considering both AUC and TPR when FPR is low. 3. The proposed method successfully identifies membership even when data undergoes slight modifications due to its design motivation.

Weaknesses

1. Though the authors claim the proposed method is designed for grey-box models, there are no direct experiments about the real grey-box models. 2. The experiments are not very comprehensive on the ablation part. For example, the authors might provide results if changing the embedding model or changing the classification models. 3. There is no intuitive explanation of the proposed methods like what is the difference between the trends of loss from members and trends of loss from non-members

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Adversarial Robustness in Machine Learning

MethodsSparse Evolutionary Training · GPT-Neo · Pythia