SoK: Semantic Privacy in Large Language Models
Baihe Ma, Yanna Jiang, Xu Wang, Guangsheng Yu, Qin Wang, Caijun Sun, Chen Li, Xuelei Qi, Ying He, Wei Ni, Ren Ping Liu

TL;DR
This paper systematically analyzes how semantic privacy risks emerge in large language models across their lifecycle, categorizing attack vectors and evaluating current defenses, highlighting critical gaps and future challenges.
Contribution
It introduces a lifecycle-centric framework for analyzing semantic privacy risks in LLMs and assesses the effectiveness of existing defenses, identifying key gaps and open challenges.
Findings
Current defenses inadequately protect against semantic inference attacks.
Significant gaps exist in protecting latent representations and contextual inferences.
Open challenges include quantifying semantic leakage and balancing privacy with generation quality.
Abstract
As Large Language Models (LLMs) are increasingly deployed in sensitive domains, traditional data privacy measures prove inadequate for protecting information that is implicit, contextual, or inferable - what we define as semantic privacy. This Systematization of Knowledge (SoK) introduces a lifecycle-centric framework to analyze how semantic privacy risks emerge across input processing, pretraining, fine-tuning, and alignment stages of LLMs. We categorize key attack vectors and assess how current defenses, such as differential privacy, embedding encryption, edge computing, and unlearning, address these threats. Our analysis reveals critical gaps in semantic-level protection, especially against contextual inference and latent representation leakage. We conclude by outlining open challenges, including quantifying semantic leakage, protecting multimodal inputs, balancing de-identification…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Ethics and Social Impacts of AI
