R.R.: Unveiling LLM Training Privacy through Recollection and Ranking

Wenlong Meng; Zhenyuan Guo; Lenan Wu; Chen Gong; Wenyan Liu; Weixian Li; Chengkun Wei; Wenzhi Chen

arXiv:2502.12658·cs.CL·June 11, 2025

R.R.: Unveiling LLM Training Privacy through Recollection and Ranking

Wenlong Meng, Zhenyuan Guo, Lenan Wu, Chen Gong, Wenyan Liu, Weixian Li, Chengkun Wei, Wenzhi Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces R.R., a novel two-step attack that reconstructs personally identifiable information from scrubbed training data of LLMs, revealing privacy vulnerabilities despite data masking.

Contribution

The paper presents R.R., a new method combining recollection prompts and ranking criteria to effectively extract PII from masked training data, highlighting privacy risks.

Findings

01

R.R. outperforms baselines in PII identification accuracy

02

LLMs leak PII even with scrubbed training data

03

The attack demonstrates significant privacy vulnerabilities in LLMs

Abstract

Large Language Models (LLMs) pose significant privacy risks, potentially leaking training data due to implicit memorization. Existing privacy attacks primarily focus on membership inference attacks (MIAs) or data extraction attacks, but reconstructing specific personally identifiable information (PII) in LLMs' training data remains challenging. In this paper, we propose R.R. (Recollect and Rank), a novel two-step privacy stealing attack that enables attackers to reconstruct PII entities from scrubbed training data where the PII entities have been masked. In the first stage, we introduce a prompt paradigm named recollection, which instructs the LLM to repeat a masked text but fill in masks. Then we can use PII identifiers to extract recollected PII candidates. In the second stage, we design a new criterion to score each PII candidate and rank them. Motivated by membership inference, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

meng-wenlong/rr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDispute Resolution and Class Actions · Artificial Intelligence in Law · Law, AI, and Intellectual Property

MethodsFocus