Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft   Prompting and Calibrated Confidence Estimation

Zhexin Zhang; Jiaxin Wen; Minlie Huang

arXiv:2307.04401·cs.CL·July 11, 2023

Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation

Zhexin Zhang, Jiaxin Wen, Minlie Huang

PDF

Open Access 1 Repo

TL;DR

Ethicist introduces a novel method combining loss smoothing, soft prompting, and calibrated confidence estimation to effectively extract specific training data from large language models, highlighting privacy risks.

Contribution

The paper presents Ethicist, a new approach for targeted data extraction from language models using loss smoothing and confidence calibration, improving extraction accuracy.

Findings

01

Significantly improves data extraction performance on benchmark.

02

Effective across different model scales and decoding strategies.

03

Provides insights into factors affecting extraction success.

Abstract

Large pre-trained language models achieve impressive results across many tasks. However, recent works point out that pre-trained language models may memorize a considerable fraction of their training data, leading to the privacy risk of information leakage. In this paper, we propose a method named Ethicist for targeted training data extraction through loss smoothed soft prompting and calibrated confidence estimation, investigating how to recover the suffix in the training data when given a prefix. To elicit memorization in the attacked model, we tune soft prompt embeddings while keeping the model fixed. We further propose a smoothing loss that smooths the loss distribution of the suffix tokens to make it easier to sample the correct suffix. In order to select the most probable suffix from a collection of sampled suffixes and estimate the prediction confidence, we propose a calibrated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thu-coai/targeted-data-extraction
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis