Learning to Refuse: Towards Mitigating Privacy Risks in LLMs

Zhenhua Liu; Tong Zhu; Chuanyuan Tan; Wenliang Chen

arXiv:2407.10058·cs.CL·September 17, 2024

Learning to Refuse: Towards Mitigating Privacy Risks in LLMs

Zhenhua Liu, Tong Zhu, Chuanyuan Tan, Wenliang Chen

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces a new dataset and framework for enabling large language models to unlearn specific private data, effectively balancing privacy protection with model utility.

Contribution

It presents the Name-Aware Unlearning Framework (NAUF) and a real-world dataset for evaluating machine unlearning in privacy-sensitive scenarios.

Findings

01

NAUF achieves a 5.65 point improvement in unlearning score over baselines.

02

The framework effectively protects individual privacy without degrading unrelated question answering.

03

The dataset enables realistic evaluation of unlearning methods in practical settings.

Abstract

Large language models (LLMs) exhibit remarkable capabilities in understanding and generating natural language. However, these models can inadvertently memorize private information, posing significant privacy risks. This study addresses the challenge of enabling LLMs to protect specific individuals' private data without the need for complete retraining. We propose \return, a Real-world pErsonal daTa UnleaRNing dataset, comprising 2,492 individuals from Wikipedia with associated QA pairs, to evaluate machine unlearning (MU) methods for protecting personal data in a realistic scenario. Additionally, we introduce the Name-Aware Unlearning Framework (NAUF) for Privacy Protection, which enables the model to learn which individuals' information should be protected without affecting its ability to answer questions related to other unrelated individuals. Our extensive experiments demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhliu0106/learning-to-refuse
pytorchOfficial

Datasets

zhliu/RETURN
dataset· 46 dl
46 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Business Intelligence · Law, AI, and Intellectual Property · Cybercrime and Law Enforcement Studies