Automated Detection of Pre-training Text in Black-box LLMs
Ruihan Hu, Yu-Ming Shang, Jiankun Peng, Wei Luo, Yazhe Wang, Xi Zhang

TL;DR
This paper introduces VeilProbe, an automated framework for detecting whether texts were part of an LLM's pre-training data in black-box scenarios, addressing privacy concerns without manual intervention.
Contribution
VeilProbe is the first automated, black-box capable method that infers membership of texts in LLM pre-training data without human-designed questions or instructions.
Findings
Effective detection on multiple datasets
Outperforms existing black-box methods
Reduces overfitting with prototype-based classifier
Abstract
Detecting whether a given text is a member of the pre-training data of Large Language Models (LLMs) is crucial for ensuring data privacy and copyright protection. Most existing methods rely on the LLM's hidden information (e.g., model parameters or token probabilities), making them ineffective in the black-box setting, where only input and output texts are accessible. Although some methods have been proposed for the black-box setting, they rely on massive manual efforts such as designing complicated questions or instructions. To address these issues, we propose VeilProbe, the first framework for automatically detecting LLMs' pre-training texts in a black-box setting without human intervention. VeilProbe utilizes a sequence-to-sequence mapping model to infer the latent mapping feature between the input text and the corresponding output suffix generated by the LLM. Then it performs the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
