Large-scale online deanonymization with LLMs

Simon Lermen; Daniel Paleka; Joshua Swanson; Michael Aerni; Nicholas Carlini; Florian Tram\`er

arXiv:2602.16800·cs.CR·February 27, 2026

Large-scale online deanonymization with LLMs

Simon Lermen, Daniel Paleka, Joshua Swanson, Michael Aerni, Nicholas Carlini, Florian Tram\`er

PDF

Open Access

TL;DR

This paper demonstrates that large language models can effectively deanonymize pseudonymous online profiles across platforms by extracting features, searching for matches, and verifying identities, surpassing classical methods significantly.

Contribution

It introduces a scalable LLM-based approach for deanonymization directly on raw user content, outperforming traditional structured-data methods in multiple online settings.

Findings

01

LLMs achieve up to 68% recall at 90% precision in deanonymization tasks.

02

The approach outperforms classical baselines by a large margin.

03

Online privacy protections are significantly weakened by LLM-based deanonymization.

Abstract

We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to classical deanonymization work (e.g., on the Netflix prize) that required structured data, our approach works directly on raw user…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Hate Speech and Cyberbullying Detection · Spam and Phishing Detection