SoK: Semantic Privacy in Large Language Models

Baihe Ma; Yanna Jiang; Xu Wang; Guangsheng Yu; Qin Wang; Caijun Sun; Chen Li; Xuelei Qi; Ying He; Wei Ni; Ren Ping Liu

arXiv:2506.23603·cs.CR·July 17, 2025

SoK: Semantic Privacy in Large Language Models

Baihe Ma, Yanna Jiang, Xu Wang, Guangsheng Yu, Qin Wang, Caijun Sun, Chen Li, Xuelei Qi, Ying He, Wei Ni, Ren Ping Liu

PDF

Open Access

TL;DR

This paper systematically analyzes how semantic privacy risks emerge in large language models across their lifecycle, categorizing attack vectors and evaluating current defenses, highlighting critical gaps and future challenges.

Contribution

It introduces a lifecycle-centric framework for analyzing semantic privacy risks in LLMs and assesses the effectiveness of existing defenses, identifying key gaps and open challenges.

Findings

01

Current defenses inadequately protect against semantic inference attacks.

02

Significant gaps exist in protecting latent representations and contextual inferences.

03

Open challenges include quantifying semantic leakage and balancing privacy with generation quality.

Abstract

As Large Language Models (LLMs) are increasingly deployed in sensitive domains, traditional data privacy measures prove inadequate for protecting information that is implicit, contextual, or inferable - what we define as semantic privacy. This Systematization of Knowledge (SoK) introduces a lifecycle-centric framework to analyze how semantic privacy risks emerge across input processing, pretraining, fine-tuning, and alignment stages of LLMs. We categorize key attack vectors and assess how current defenses, such as differential privacy, embedding encryption, edge computing, and unlearning, address these threats. Our analysis reveals critical gaps in semantic-level protection, especially against contextual inference and latent representation leakage. We conclude by outlining open challenges, including quantifying semantic leakage, protecting multimodal inputs, balancing de-identification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Ethics and Social Impacts of AI