When Reasoning Leaks Membership: Membership Inference Attack on Black-box Large Reasoning Models

Ruihan Hu; Yu-Ming Shang; Wei Luo; Ye Tao; Xi Zhang

arXiv:2601.13607·cs.CR·January 21, 2026

When Reasoning Leaks Membership: Membership Inference Attack on Black-box Large Reasoning Models

Ruihan Hu, Yu-Ming Shang, Wei Luo, Ye Tao, Xi Zhang

PDF

Open Access

TL;DR

This paper demonstrates that black-box large reasoning models leak membership information through reasoning traces, posing privacy risks, and introduces BlackSpectrum, a novel attack framework exploiting these traces to infer training data membership.

Contribution

The paper is the first to systematically explore membership inference attacks on black-box large reasoning models and proposes the BlackSpectrum attack framework leveraging reasoning trace representations.

Findings

01

Exposing reasoning traces increases vulnerability to membership inference attacks.

02

BlackSpectrum effectively predicts data membership with high accuracy.

03

New datasets arXivReasoning and BookReasoning support future research.

Abstract

Large Reasoning Models (LRMs) have rapidly gained prominence for their strong performance in solving complex tasks. Many modern black-box LRMs expose the intermediate reasoning traces through APIs to improve transparency (e.g., Gemini-2.5 and Claude-sonnet). Despite their benefits, we find that these traces can leak membership signals, creating a new privacy threat even without access to token logits used in prior attacks. In this work, we initiate the first systematic exploration of Membership Inference Attacks (MIAs) on black-box LRMs. Our preliminary analysis shows that LRMs produce confident, recall-like reasoning traces on familiar training member samples but more hesitant, inference-like reasoning traces on non-members. The representations of these traces are continuously distributed in the semantic latent space, spanning from familiar to unfamiliar samples. Building on this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data · Advanced Graph Neural Networks