Adaptive Pre-training Data Detection for Large Language Models via Surprising Tokens
Anqi Zhang, Chaofeng Wu

TL;DR
This paper introduces an adaptive method for detecting pre-training data in large language models by identifying surprising tokens, which improves detection accuracy without needing access to training data or additional training.
Contribution
The proposed method adaptively locates surprising tokens based on model predictions, enhancing pre-training data detection without relying on training data access or reference models.
Findings
Achieves up to 29.5% improvement over existing methods.
Effective detection without access to pre-training data or additional training.
Introduces Dolma-Book benchmark for evaluation.
Abstract
While large language models (LLMs) are extensively used, there are raising concerns regarding privacy, security, and copyright due to their opaque training data, which brings the problem of detecting pre-training data on the table. Current solutions to this problem leverage techniques explored in machine learning privacy such as Membership Inference Attacks (MIAs), which heavily depend on LLMs' capability of verbatim memorization. However, this reliance presents challenges, especially given the vast amount of training data and the restricted number of effective training epochs. In this paper, we propose an adaptive pre-training data detection method which alleviates this reliance and effectively amplify the identification. Our method adaptively locates \textit{surprising tokens} of the input. A token is surprising to a LLM if the prediction on the token is "certain but wrong", which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
