From Weak Cues to Real Identities: Evaluating Inference-Driven De-Anonymization in LLM Agents
Myeongseob Ko, Jihyun Jeong, Sumiran Singh Thakur, Gyuhak Kim, and Ruoxi Jia

TL;DR
This paper demonstrates that large language model-based agents can autonomously infer real-world identities from scattered, non-identifying cues, posing a significant privacy risk that surpasses traditional explicit data disclosures.
Contribution
It formalizes the inference-driven linkage threat and systematically evaluates its effectiveness across multiple realistic scenarios, revealing substantial privacy vulnerabilities.
Findings
Agents successfully perform identity linkage without task-specific heuristics.
In the Netflix setting, agents reconstruct 79.2% of identities, outperforming the baseline.
Identity inference occurs even without explicit adversarial prompts.
Abstract
Anonymization is widely treated as a practical safeguard because re-identifying anonymous records was historically costly, requiring domain expertise, tailored algorithms, and manual corroboration. We study a growing privacy risk that may weaken this barrier: LLM-based agents can autonomously reconstruct real-world identities from scattered, individually non-identifying cues. By combining these sparse cues with public information, agents resolve identities without bespoke engineering. We formalize this threat as \emph{inference-driven linkage} and systematically evaluate it across three settings: classical linkage scenarios (Netflix and AOL), \emph{InferLink} (a controlled benchmark varying task intent, shared cues, and attacker knowledge), and modern text-rich artifacts. Without task-specific heuristics, agents successfully execute both fixed-pool matching and open-ended identity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Ethics and Social Impacts of AI · Data Quality and Management
