Depth Gives a False Sense of Privacy: LLM Internal States Inversion
Tian Dong, Yan Meng, Shaofeng Li, Guoxing Chen, Zhen Liu, Haojin Zhu

TL;DR
This paper demonstrates that internal states of large language models can be inverted with high accuracy using novel attack methods, raising privacy concerns and challenging assumptions about the irreversibility of deep model representations.
Contribution
It introduces four new inversion attacks, including white-box and black-box methods, that significantly improve the ability to reconstruct inputs from LLM internal states.
Findings
High success rate in reconstructing long medical prompts
Nearly perfect inversion of 4,112-token prompts from Llama-3
Existing defenses are ineffective against proposed inversion attacks
Abstract
Large Language Models (LLMs) are increasingly integrated into daily routines, yet they raise significant privacy and safety concerns. Recent research proposes collaborative inference, which outsources the early-layer inference to ensure data locality, and introduces model safety auditing based on inner neuron patterns. Both techniques expose the LLM's Internal States (ISs), which are traditionally considered irreversible to inputs due to optimization challenges and the highly abstract representations in deep layers. In this work, we challenge this assumption by proposing four inversion attacks that significantly improve the semantic similarity and token matching rate of inverted inputs. Specifically, we first develop two white-box optimization-based attacks tailored for low-depth and high-depth ISs. These attacks avoid local minima convergence, a limitation observed in prior work,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLegal Systems and Judicial Processes · Law, AI, and Intellectual Property · Legal and Constitutional Studies
