Proving Test Set Contamination in Black Box Language Models
Yonatan Oren, Nicole Meister, Niladri Chatterji, Faisal, Ladhak, Tatsunori B. Hashimoto

TL;DR
This paper introduces a method to detect test set contamination in large language models by analyzing the likelihood of canonical versus shuffled benchmark orderings, without needing access to pretraining data or model weights.
Contribution
It presents a provable guarantee approach for contamination detection based on exchangeability and memorization patterns in language models.
Findings
Reliable detection of contamination in models as small as 1.4 billion parameters
Effective on small test sets of only 1000 examples
Little evidence of pervasive contamination in tested models
Abstract
Large language models are trained on vast amounts of internet data, prompting concerns and speculation that they have memorized public benchmarks. Going from speculation to proof of contamination is challenging, as the pretraining data used by proprietary models are often not publicly accessible. We show that it is possible to provide provable guarantees of test set contamination in language models without access to pretraining data or model weights. Our approach leverages the fact that when there is no data contamination, all orderings of an exchangeable benchmark should be equally likely. In contrast, the tendency for language models to memorize example order means that a contaminated language model will find certain canonical orderings to be much more likely than others. Our test flags potential contamination whenever the likelihood of a canonically ordered benchmark dataset is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)
MethodsSparse Evolutionary Training
