Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method

Weichao Zhang; Ruqing Zhang; Jiafeng Guo; Maarten de Rijke; Yixing Fan; Xueqi Cheng

arXiv:2409.14781·cs.CL·May 22, 2025

Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method

Weichao Zhang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a divergence-based calibration method for detecting whether texts were part of an LLM's training data, improving accuracy over existing methods, and provides a new Chinese benchmark for evaluation.

Contribution

The paper proposes a novel divergence-based calibration approach for pretraining data detection and introduces the PatentMIA benchmark for Chinese texts.

Findings

01

The proposed method outperforms existing detection approaches.

02

Experimental results show significant accuracy improvements.

03

The method is effective on both English and Chinese datasets.

Abstract

As the scale of training corpora for large language models (LLMs) grows, model developers become increasingly reluctant to disclose details on their data. This lack of transparency poses challenges to scientific evaluation and ethical deployment. Recently, pretraining data detection approaches, which infer whether a given text was part of an LLM's training data through black-box access, have been explored. The Min-K\% Prob method, which has achieved state-of-the-art results, assumes that a non-training example tends to contain a few outlier words with low token probabilities. However, the effectiveness may be limited as it tends to misclassify non-training texts that contain many common words with high probabilities predicted by LLMs. To address this issue, we introduce a divergence-based calibration method, inspired by the divergence-from-randomness concept, to calibrate token…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhang-wei-chao/dc-pdd
pytorchOfficial

Videos

Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method· underline

Taxonomy

TopicsNatural Language Processing Techniques