SoK: The Privacy Paradox of Large Language Models: Advancements, Privacy Risks, and Mitigation
Yashothara Shanmugarasa, Ming Ding, M.A.P Chamikara, Thierry Rakotoarivelo

TL;DR
This paper provides a comprehensive survey of privacy risks associated with large language models, covering training data, user prompts, generated outputs, and LLM agents, and evaluates mitigation strategies.
Contribution
It uniquely categorizes privacy challenges across multiple aspects of LLMs and analyzes existing mitigation methods, highlighting gaps and future research directions.
Findings
Existing mitigation methods have limitations in addressing privacy risks.
Privacy vulnerabilities exist in user prompts and generated outputs.
The paper identifies key areas needing further research.
Abstract
Large language models (LLMs) are sophisticated artificial intelligence systems that enable machines to generate human-like text with remarkable precision. While LLMs offer significant technological progress, their development using vast amounts of user data scraped from the web and collected from extensive user interactions poses risks of sensitive information leakage. Most existing surveys focus on the privacy implications of the training data but tend to overlook privacy risks from user interactions and advanced LLM capabilities. This paper aims to fill that gap by providing a comprehensive analysis of privacy in LLMs, categorizing the challenges into four main areas: (i) privacy issues in LLM training data, (ii) privacy challenges associated with user prompts, (iii) privacy vulnerabilities in LLM-generated outputs, and (iv) privacy challenges involving LLM agents. We evaluate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
