PrivacyXray: Detecting Privacy Breaches in LLMs through Semantic Consistency and Probability Certainty

Jinwen He; Yiyang Lu; Zijin Lin; Kai Chen; Yue Zhao

arXiv:2506.19563·cs.CR·June 25, 2025

PrivacyXray: Detecting Privacy Breaches in LLMs through Semantic Consistency and Probability Certainty

Jinwen He, Yiyang Lu, Zijin Lin, Kai Chen, Yue Zhao

PDF

Open Access

TL;DR

PrivacyXray is a novel framework that detects privacy breaches in large language models by analyzing their internal states, leveraging semantic and probabilistic metrics to identify private information leaks during inference.

Contribution

It introduces a new detection method that does not rely on external datasets, synthesizes realistic private data, and achieves high accuracy in identifying privacy breaches across multiple LLMs.

Findings

01

Achieves 92.69% average accuracy in privacy breach detection.

02

Outperforms state-of-the-art methods with a 20.06% accuracy increase.

03

Demonstrates stability and practical utility in real-world scenarios.

Abstract

Large Language Models (LLMs) are widely used in sensitive domains, including healthcare, finance, and legal services, raising concerns about potential private information leaks during inference. Privacy extraction attacks, such as jailbreaking, expose vulnerabilities in LLMs by crafting inputs that force the models to output sensitive information. However, these attacks cannot verify whether the extracted private information is accurate, as no public datasets exist for cross-validation, leaving a critical gap in private information detection during inference. To address this, we propose PrivacyXray, a novel framework detecting privacy breaches by analyzing LLM inner states. Our analysis reveals that LLMs exhibit higher semantic coherence and probabilistic certainty when generating correct private outputs. Based on this, PrivacyXray detects privacy breaches using four metrics:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Digital and Cyber Forensics · Data Quality and Management