Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!

Zhexin Zhang; Yuhao Sun; Junxiao Yang; Shiyao Cui; Yuanchao Zhang; Hongning Wang; Minlie Huang

arXiv:2505.15656·cs.CL·April 6, 2026

Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!

Zhexin Zhang, Yuhao Sun, Junxiao Yang, Shiyao Cui, Yuanchao Zhang, Hongning Wang, Minlie Huang

PDF

2 Repos 1 Video

TL;DR

This paper uncovers a significant privacy risk in fine-tuning open-source LLMs, where creators can extract proprietary data from models with black-box access, demonstrating high success rates and discussing defenses.

Contribution

It reveals a novel backdoor attack that enables extraction of proprietary fine-tuning data from open-source LLMs, highlighting a critical privacy vulnerability.

Findings

01

Up to 76.3% of fine-tuning data can be extracted in practical settings.

02

Success rate of data extraction can reach 94.9% in ideal conditions.

03

Detection-based defenses can be bypassed with improved attacks.

Abstract

Fine-tuning on open-source Large Language Models (LLMs) with proprietary data is now a standard practice for downstream developers to obtain task-specific LLMs. Surprisingly, we reveal a new and concerning risk along with the practice: the creator of the open-source LLMs can later extract the private downstream fine-tuning data through simple backdoor training, only requiring black-box access to the fine-tuned downstream model. Our comprehensive experiments, across 4 popularly used open-source models with 3B to 32B parameters and 2 downstream datasets, suggest that the extraction performance can be strikingly high: in practical settings, as much as 76.3% downstream fine-tuning data (queries) out of a total 5,000 samples can be perfectly extracted, and the success rate can increase to 94.9% in more ideal settings. We also explore a detection-based defense strategy but find it can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!· slideslive