G-Drift MIA: Membership Inference via Gradient-Induced Feature Drift in LLMs

Ravi Ranjan; Utkarsh Grover; Xiaomin Lin; Agoritsa Polyzou

arXiv:2604.00419·cs.LG·April 2, 2026

G-Drift MIA: Membership Inference via Gradient-Induced Feature Drift in LLMs

Ravi Ranjan, Utkarsh Grover, Xiaomin Lin, Agoritsa Polyzou

PDF

TL;DR

G-Drift MIA introduces a white-box attack leveraging gradient-induced feature drift to effectively identify training data membership in large language models, surpassing existing methods.

Contribution

The paper presents a novel gradient-based membership inference attack that outperforms existing approaches and links memorization to feature drift in LLMs.

Findings

01

G-Drift significantly outperforms confidence, perplexity, and reference-based attacks.

02

Memorized samples show smaller, more structured feature drift.

03

Gradient interventions can effectively audit training data membership.

Abstract

Large language models (LLMs) are trained on massive web-scale corpora, raising growing concerns about privacy and copyright. Membership inference attacks (MIAs) aim to determine whether a given example was used during training. Existing LLM MIAs largely rely on output probabilities or loss values and often perform only marginally better than random guessing when members and non-members are drawn from the same distribution. We introduce G-Drift MIA, a white-box membership inference method based on gradient-induced feature drift. Given a candidate (x,y), we apply a single targeted gradient-ascent step that increases its loss and measure the resulting changes in internal representations, including logits, hidden-layer activations, and projections onto fixed feature directions, before and after the update. These drift signals are used to train a lightweight logistic classifier that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.