PrivacyXray: Detecting Privacy Breaches in LLMs through Semantic Consistency and Probability Certainty
Jinwen He, Yiyang Lu, Zijin Lin, Kai Chen, Yue Zhao

TL;DR
PrivacyXray is a novel framework that detects privacy breaches in large language models by analyzing their internal states, leveraging semantic and probabilistic metrics to identify private information leaks during inference.
Contribution
It introduces a new detection method that does not rely on external datasets, synthesizes realistic private data, and achieves high accuracy in identifying privacy breaches across multiple LLMs.
Findings
Achieves 92.69% average accuracy in privacy breach detection.
Outperforms state-of-the-art methods with a 20.06% accuracy increase.
Demonstrates stability and practical utility in real-world scenarios.
Abstract
Large Language Models (LLMs) are widely used in sensitive domains, including healthcare, finance, and legal services, raising concerns about potential private information leaks during inference. Privacy extraction attacks, such as jailbreaking, expose vulnerabilities in LLMs by crafting inputs that force the models to output sensitive information. However, these attacks cannot verify whether the extracted private information is accurate, as no public datasets exist for cross-validation, leaving a critical gap in private information detection during inference. To address this, we propose PrivacyXray, a novel framework detecting privacy breaches by analyzing LLM inner states. Our analysis reveals that LLMs exhibit higher semantic coherence and probabilistic certainty when generating correct private outputs. Based on this, PrivacyXray detects privacy breaches using four metrics:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Digital and Cyber Forensics · Data Quality and Management
