When Reasoning Leaks Membership: Membership Inference Attack on Black-box Large Reasoning Models
Ruihan Hu, Yu-Ming Shang, Wei Luo, Ye Tao, Xi Zhang

TL;DR
This paper demonstrates that black-box large reasoning models leak membership information through reasoning traces, posing privacy risks, and introduces BlackSpectrum, a novel attack framework exploiting these traces to infer training data membership.
Contribution
The paper is the first to systematically explore membership inference attacks on black-box large reasoning models and proposes the BlackSpectrum attack framework leveraging reasoning trace representations.
Findings
Exposing reasoning traces increases vulnerability to membership inference attacks.
BlackSpectrum effectively predicts data membership with high accuracy.
New datasets arXivReasoning and BookReasoning support future research.
Abstract
Large Reasoning Models (LRMs) have rapidly gained prominence for their strong performance in solving complex tasks. Many modern black-box LRMs expose the intermediate reasoning traces through APIs to improve transparency (e.g., Gemini-2.5 and Claude-sonnet). Despite their benefits, we find that these traces can leak membership signals, creating a new privacy threat even without access to token logits used in prior attacks. In this work, we initiate the first systematic exploration of Membership Inference Attacks (MIAs) on black-box LRMs. Our preliminary analysis shows that LRMs produce confident, recall-like reasoning traces on familiar training member samples but more hesitant, inference-like reasoning traces on non-members. The representations of these traces are continuously distributed in the semantic latent space, spanning from familiar to unfamiliar samples. Building on this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data · Advanced Graph Neural Networks
