Time Travel in LLMs: Tracing Data Contamination in Large Language Models
Shahriar Golchin, Mihai Surdeanu

TL;DR
This paper introduces a novel method for detecting data contamination in large language models by analyzing instance and partition-level overlaps using guided instruction prompts and statistical measures, achieving high accuracy.
Contribution
The paper presents a new, effective approach for identifying data contamination in LLMs at both instance and partition levels, utilizing guided instruction prompts and statistical tests.
Findings
Achieves 92-100% accuracy in contamination detection
Identifies GPT-4 contamination with AG News, WNLI, and XSum datasets
Provides a scalable method for contamination assessment
Abstract
Data contamination, i.e., the presence of test data from downstream tasks in the training data of large language models (LLMs), is a potential major issue in measuring LLMs' real effectiveness on other tasks. We propose a straightforward yet effective method for identifying data contamination within LLMs. At its core, our approach starts by identifying potential contamination at the instance level; using this information, our approach then assesses wider contamination at the partition level. To estimate contamination of individual instances, we employ "guided instruction:" a prompt consisting of the dataset name, partition type, and the random-length initial segment of a reference instance, asking the LLM to complete it. An instance is flagged as contaminated if the LLM's output either exactly or nearly matches the latter segment of the reference. To understand if an entire partition is…
Peer Reviews
Decision·ICLR 2024 spotlight
The proposed method is straightforward and adaptable to a wide range of datasets.
1. I have concerns regarding the soundness of the paper's evaluation methodology. The proposed method hinges on the assumption that a data instance is contaminated in an LLM if the LLM can complete the instance based on its prefix. The paper's evaluation primarily revolves around how well the proposed methods are compared to human experts under this assumption However, these concerns raise doubts about whether the underlying assumption holds for several reasons. (1) The inability of an LLM to co
- Intuitive guided and general prompts to detect instance level contamination. - Approximating human expert classification for exact and approximate match using GPT-4 as a classifier, i.e. approximating semantic match. - Validation on a known contaminated LLM.
- The authors rely on the algorithm to begin with when deciding what partitions were not leaked and should be added during fine-tuning. This has a circular dependence/assumption. (This point was addressed during discussion with the authors as a writing/explanation issue rather than a true circular dependence). - Different levels of data leakage is not considered. For example, would GPT-4 be detected as having seen paritions of datasets that follow well-known formats seen from other datasets if i
Originality: The paper offers a fresh perspective on assessing the capabilities of LLMs in terms of potential dataset contamination. The methodologies introduced, especially the use of GPT-4's few-shot in-context learning, is innovative. Quality: The research appears thorough with detailed evaluations using two different algorithms. The results are well-tabulated, and the comparison with ChatGPT-Cheat offers a clearer understanding of the proposed methods' effectiveness. Clarity: The paper is st
Scope: The paper focuses primarily on GPT-3.5 and GPT-4. A broader range of LLMs could provide more generalizable insights.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Layer Normalization · Softmax · Absolute Position Encodings · Residual Connection · Dense Connections · Dropout
