Differentiation-Based Extraction of Proprietary Data from Fine-Tuned LLMs
Zongjie Li, Daoyuan Wu, Shuai Wang, Zhendong Su

TL;DR
This paper introduces a novel method called Differentiated Data Extraction (DDE) for extracting proprietary data from fine-tuned LLMs, demonstrating its effectiveness and proposing defenses to mitigate such risks.
Contribution
It formulates the data extraction problem for fine-tuned LLMs, develops DDE exploiting model confidence differences, and proposes defenses, highlighting data leak vulnerabilities.
Findings
DDE outperforms existing extraction methods across multiple domains.
Extraction feasibility demonstrated in various real-world scenarios.
Proposed defenses effectively mitigate DDE attacks with minimal performance impact.
Abstract
The increasing demand for domain-specific and human-aligned Large Language Models (LLMs) has led to the widespread adoption of Supervised Fine-Tuning (SFT) techniques. SFT datasets often comprise valuable instruction-response pairs, making them highly valuable targets for potential extraction. This paper studies this critical research problem for the first time. We start by formally defining and formulating the problem, then explore various attack goals, types, and variants based on the unique properties of SFT data in real-world scenarios. Based on our analysis of extraction behaviors of direct extraction, we develop a novel extraction method specifically designed for SFT models, called Differentiated Data Extraction (DDE), which exploits the confidence levels of fine-tuned models and their behavioral differences from pre-trained base models. Through extensive experiments across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsBalanced Selection · Shrink and Fine-Tune
