Assessing Deanonymization Risks with Stylometry-Assisted LLM Agent
Boyang Zhang, Yang Zhang

TL;DR
This paper presents SALA, a stylometry-assisted LLM analysis framework that assesses and mitigates deanonymization risks in textual data by combining stylometric features with LLM reasoning, demonstrating high accuracy and privacy-preserving rewriting strategies.
Contribution
Introduces SALA, an innovative LLM-based method integrating stylometry and reasoning for authorship attribution and privacy protection in textual data.
Findings
SALA achieves high inference accuracy on large-scale news datasets.
Augmenting SALA with a database improves robustness.
Recomposition strategies effectively reduce authorship identifiability.
Abstract
The rapid advancement of large language models (LLMs) has enabled powerful authorship inference capabilities, raising growing concerns about unintended deanonymization risks in textual data such as news articles. In this work, we introduce an LLM agent designed to evaluate and mitigate such risks through a structured, interpretable pipeline. Central to our framework is the proposed (Stylometry-Assisted LLM Analysis) method, which integrates quantitative stylometric features with LLM reasoning for robust and transparent authorship attribution. Experiments on large-scale news datasets demonstrate that , particularly when augmented with a database module, achieves high inference accuracy in various scenarios. Finally, we propose a guided recomposition strategy that leverages the agent's reasoning trace to generate rewriting prompts, effectively reducing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Topic Modeling · Hate Speech and Cyberbullying Detection
