Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation
Zhexin Zhang, Jiaxin Wen, Minlie Huang

TL;DR
Ethicist introduces a novel method combining loss smoothing, soft prompting, and calibrated confidence estimation to effectively extract specific training data from large language models, highlighting privacy risks.
Contribution
The paper presents Ethicist, a new approach for targeted data extraction from language models using loss smoothing and confidence calibration, improving extraction accuracy.
Findings
Significantly improves data extraction performance on benchmark.
Effective across different model scales and decoding strategies.
Provides insights into factors affecting extraction success.
Abstract
Large pre-trained language models achieve impressive results across many tasks. However, recent works point out that pre-trained language models may memorize a considerable fraction of their training data, leading to the privacy risk of information leakage. In this paper, we propose a method named Ethicist for targeted training data extraction through loss smoothed soft prompting and calibrated confidence estimation, investigating how to recover the suffix in the training data when given a prefix. To elicit memorization in the attacked model, we tune soft prompt embeddings while keeping the model fixed. We further propose a smoothing loss that smooths the loss distribution of the suffix tokens to make it easier to sample the correct suffix. In order to select the most probable suffix from a collection of sampled suffixes and estimate the prediction confidence, we propose a calibrated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
