From Weak Cues to Real Identities: Evaluating Inference-Driven De-Anonymization in LLM Agents

Myeongseob Ko; Jihyun Jeong; Sumiran Singh Thakur; Gyuhak Kim; and Ruoxi Jia

arXiv:2603.18382·cs.AI·March 20, 2026

From Weak Cues to Real Identities: Evaluating Inference-Driven De-Anonymization in LLM Agents

Myeongseob Ko, Jihyun Jeong, Sumiran Singh Thakur, Gyuhak Kim, and Ruoxi Jia

PDF

Open Access

TL;DR

This paper demonstrates that large language model-based agents can autonomously infer real-world identities from scattered, non-identifying cues, posing a significant privacy risk that surpasses traditional explicit data disclosures.

Contribution

It formalizes the inference-driven linkage threat and systematically evaluates its effectiveness across multiple realistic scenarios, revealing substantial privacy vulnerabilities.

Findings

01

Agents successfully perform identity linkage without task-specific heuristics.

02

In the Netflix setting, agents reconstruct 79.2% of identities, outperforming the baseline.

03

Identity inference occurs even without explicit adversarial prompts.

Abstract

Anonymization is widely treated as a practical safeguard because re-identifying anonymous records was historically costly, requiring domain expertise, tailored algorithms, and manual corroboration. We study a growing privacy risk that may weaken this barrier: LLM-based agents can autonomously reconstruct real-world identities from scattered, individually non-identifying cues. By combining these sparse cues with public information, agents resolve identities without bespoke engineering. We formalize this threat as \emph{inference-driven linkage} and systematically evaluate it across three settings: classical linkage scenarios (Netflix and AOL), \emph{InferLink} (a controlled benchmark varying task intent, shared cues, and attacker knowledge), and modern text-rich artifacts. Without task-specific heuristics, agents successfully execute both fixed-pool matching and open-ended identity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Ethics and Social Impacts of AI · Data Quality and Management