LLM-PBE: Assessing Data Privacy in Large Language Models

Qinbin Li; Junyuan Hong; Chulin Xie; Jeffrey Tan; Rachel Xin; Junyi; Hou; Xavier Yin; Zhun Wang; Dan Hendrycks; Zhangyang Wang; Bo Li; Bingsheng; He; Dawn Song

arXiv:2408.12787·cs.CR·September 9, 2024·2 cites

LLM-PBE: Assessing Data Privacy in Large Language Models

Qinbin Li, Junyuan Hong, Chulin Xie, Jeffrey Tan, Rachel Xin, Junyi, Hou, Xavier Yin, Zhun Wang, Dan Hendrycks, Zhangyang Wang, Bo Li, Bingsheng, He, Dawn Song

PDF

Open Access 1 Repo

TL;DR

This paper introduces LLM-PBE, a comprehensive toolkit for systematically evaluating data privacy risks in large language models across their lifecycle, addressing a critical gap in privacy assessment methods.

Contribution

The paper presents LLM-PBE, the first toolkit designed to assess data privacy risks in LLMs, incorporating diverse attack and defense strategies and analyzing various data types and model factors.

Findings

01

Model size influences privacy risk levels

02

Data characteristics affect leakage susceptibility

03

Temporal factors impact privacy vulnerabilities

Abstract

Large Language Models (LLMs) have become integral to numerous domains, significantly advancing applications in data management, mining, and analysis. Their profound capabilities in processing and interpreting complex language data, however, bring to light pressing concerns regarding data privacy, especially the risk of unintentional training data leakage. Despite the critical nature of this issue, there has been no existing literature to offer a comprehensive assessment of data privacy risks in LLMs. Addressing this gap, our paper introduces LLM-PBE, a toolkit crafted specifically for the systematic evaluation of data privacy risks in LLMs. LLM-PBE is designed to analyze privacy across the entire lifecycle of LLMs, incorporating diverse attack and defense strategies, and handling various data types and metrics. Through detailed experimentation with multiple LLMs, LLM-PBE facilitates an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

QinbinLi/LLM-PBE
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management