Retrieval-Confused Generation is a Good Defender for Privacy Violation Attack of Large Language Models

Wanli Peng; Xin Chen; Hang Fu; XinYu He; Xue Yiming; Juan Wen

arXiv:2506.19889·cs.CR·June 26, 2025

Retrieval-Confused Generation is a Good Defender for Privacy Violation Attack of Large Language Models

Wanli Peng, Xin Chen, Hang Fu, XinYu He, Xue Yiming, Juan Wen

PDF

TL;DR

This paper introduces a retrieval-confused generation method that effectively defends large language models against privacy violation attacks by generating misleading responses through a novel retrieval strategy, improving privacy protection without high inference costs.

Contribution

The paper proposes a novel retrieval-confused generation framework that covertly defends against privacy attacks by rewriting queries and retrieving irrelevant data, enhancing privacy without costly inference or exposing defense strategies.

Findings

01

Effective in defending against privacy violation attacks

02

Outperforms existing anonymization methods in experiments

03

Works across multiple datasets and large language models

Abstract

Recent advances in large language models (LLMs) have made a profound impact on our society and also raised new security concerns. Particularly, due to the remarkable inference ability of LLMs, the privacy violation attack (PVA), revealed by Staab et al., introduces serious personal privacy issues. Existing defense methods mainly leverage LLMs to anonymize the input query, which requires costly inference time and cannot gain satisfactory defense performance. Moreover, directly rejecting the PVA query seems like an effective defense method, while the defense method is exposed, promoting the evolution of PVA. In this paper, we propose a novel defense paradigm based on retrieval-confused generation (RCG) of LLMs, which can efficiently and covertly defend the PVA. We first design a paraphrasing prompt to induce the LLM to rewrite the "user comments" of the attack query to construct a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.