What Hard Tokens Reveal: Exploiting Low-confidence Tokens for Membership Inference Attacks against Large Language Models

Md Tasnim Jawad; Mingyan Xiao; Yanzhao Wu

arXiv:2601.20885·cs.CR·January 30, 2026

What Hard Tokens Reveal: Exploiting Low-confidence Tokens for Membership Inference Attacks against Large Language Models

Md Tasnim Jawad, Mingyan Xiao, Yanzhao Wu

PDF

Open Access

TL;DR

This paper introduces a novel membership inference attack method that leverages low-confidence token analysis in large language models, significantly improving attack effectiveness over existing sequence-level approaches.

Contribution

The study proposes HT-MIA, a new token-level approach that exploits hard tokens to enhance membership inference attacks on LLMs, outperforming previous methods.

Findings

01

HT-MIA outperforms seven state-of-the-art MIA baselines.

02

Token-level analysis reveals stronger membership signals at hard tokens.

03

Differential privacy effectively defends against the proposed attack.

Abstract

With the widespread adoption of Large Language Models (LLMs) and increasingly stringent privacy regulations, protecting data privacy in LLMs has become essential, especially for privacy-sensitive applications. Membership Inference Attacks (MIAs) attempt to determine whether a specific data sample was included in the model training/fine-tuning dataset, posing serious privacy risks. However, most existing MIA techniques against LLMs rely on sequence-level aggregated prediction statistics, which fail to distinguish prediction improvements caused by generalization from those caused by memorization, leading to low attack effectiveness. To address this limitation, we propose a novel membership inference approach that captures the token-level probabilities for low-confidence (hard) tokens, where membership signals are more pronounced. By comparing token-level probability improvements at hard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data · Artificial Intelligence in Healthcare and Education