SoK: The Privacy Paradox of Large Language Models: Advancements, Privacy Risks, and Mitigation

Yashothara Shanmugarasa; Ming Ding; M.A.P Chamikara; Thierry Rakotoarivelo

arXiv:2506.12699·cs.CR·June 23, 2025

SoK: The Privacy Paradox of Large Language Models: Advancements, Privacy Risks, and Mitigation

Yashothara Shanmugarasa, Ming Ding, M.A.P Chamikara, Thierry Rakotoarivelo

PDF

TL;DR

This paper provides a comprehensive survey of privacy risks associated with large language models, covering training data, user prompts, generated outputs, and LLM agents, and evaluates mitigation strategies.

Contribution

It uniquely categorizes privacy challenges across multiple aspects of LLMs and analyzes existing mitigation methods, highlighting gaps and future research directions.

Findings

01

Existing mitigation methods have limitations in addressing privacy risks.

02

Privacy vulnerabilities exist in user prompts and generated outputs.

03

The paper identifies key areas needing further research.

Abstract

Large language models (LLMs) are sophisticated artificial intelligence systems that enable machines to generate human-like text with remarkable precision. While LLMs offer significant technological progress, their development using vast amounts of user data scraped from the web and collected from extensive user interactions poses risks of sensitive information leakage. Most existing surveys focus on the privacy implications of the training data but tend to overlook privacy risks from user interactions and advanced LLM capabilities. This paper aims to fill that gap by providing a comprehensive analysis of privacy in LLMs, categorizing the challenges into four main areas: (i) privacy issues in LLM training data, (ii) privacy challenges associated with user prompts, (iii) privacy vulnerabilities in LLM-generated outputs, and (iv) privacy challenges involving LLM agents. We evaluate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.