Fine-tuning can Help Detect Pretraining Data from Large Language Models

Hengxiang Zhang; Songxin Zhang; Bingyi Jing; Hongxin Wei

arXiv:2410.10880·cs.CL·March 18, 2025

Fine-tuning can Help Detect Pretraining Data from Large Language Models

Hengxiang Zhang, Songxin Zhang, Bingyi Jing, Hongxin Wei

PDF

Open Access 1 Video 3 Reviews

TL;DR

This paper introduces Fine-tuned Score Deviation (FSD), a novel method that leverages fine-tuning on unseen data to enhance the detection of pretraining data in large language models, addressing current limitations.

Contribution

The paper proposes FSD, a new scoring method that improves pretraining data detection by measuring score deviations after fine-tuning on unseen data, outperforming existing techniques.

Findings

01

FSD significantly improves AUC scores on benchmark datasets.

02

Fine-tuning on unseen data enhances the distinction between members and non-members.

03

The method is effective across various large language models.

Abstract

In the era of large language models (LLMs), detecting pretraining data has been increasingly important due to concerns about fair evaluation and ethical risks. Current methods differentiate members and non-members by designing scoring functions, like Perplexity and Min-k%. However, the diversity and complexity of training data magnifies the difficulty of distinguishing, leading to suboptimal performance in detecting pretraining data. In this paper, we first explore the benefits of unseen data, which can be easily collected after the release of the LLM. We find that the perplexities of LLMs shift differently for members and non-members, after fine-tuning with a small amount of previously unseen data. In light of this, we introduce a novel and effective method termed Fine-tuned Score Deviation(FSD), which improves the performance of current scoring functions for pretraining data…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 5Confidence 5

Strengths

1. The idea is simple and intriguing, enlarging the difference between members and non-members could boost any MIA method. 2. The idea works for any MIA method.

Weaknesses

The evaluation has major flaw. It only uses benchmarks that separates members and non-members based on cut-off dates. As Duan et al., 2024, Das et al., 2024, and Maini et al., 2024 show, this is fundamentally wrong because it introduces a temporal shift that bias the benchmark. These works have consistently proved the need to avoid evaluating MIA based on cut-off dates. This paper acknowledges these works on lines 458-459 and questions whether the results of this ICLR submission are influenced b

Reviewer 02Rating 6Confidence 3

Strengths

Very interesting observation, and notable improvement in pretraining data detection accuracy. I think this paper has a clearly defined objective, and interesting empirical results. The results are strong, showing substantial improvements in AUC and tpr at low fpr across the selected datasets and models. FSD helps improving existing score-based pretraining data detection methods.

Weaknesses

1. Have you experimented on the MIMIR dataset? This dataset seems to be more challenging, and authors claim this is because seen and unseen examples are not from different temporal distributions. I am very interested to see how this method helps MIA on MIMIR dataset. 2. They have not explained or given intuition about why finetuning increases the gap between distributions. 3. The method requires fine-tuning the LLM, which makes it more expensive compared to existing methods. If authors find s

Reviewer 03Rating 8Confidence 3

Strengths

* Clear introduction to the problem and motivation. * The method is well explained and supported with intuitive examples, such as Figure 2. * Comparisons with other relevant baselines, and showing significant improvement over them. * Thorough experiments, including ablation studies that address additional research questions. * Creative use of non-member data.

Weaknesses

* The paper assumes access to unseen data in the same domain but doesn’t define 'domain' clearly. Could the authors explain how they handle differences between domains, especially if vocabularies differ a lot, and how they decide what data counts as 'same domain'? * *"To the best of our knowledge, our method is the first to utilize some collected non-members in the task of pretraining data detection"* - There’s limited information on how to collect non-member data, which seems a key aspect of yo

Videos

Fine-tuning can Help Detect Pretraining Data from Large Language Models· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques