Towards Data Contamination Detection for Modern Large Language Models: Limitations, Inconsistencies, and Oracle Challenges
Vinay Samuel, Yue Zhou, Henry Peng Zou

TL;DR
This paper evaluates the effectiveness of current data contamination detection methods on state-of-the-art large language models across challenging benchmarks, revealing significant limitations and inconsistencies that highlight the need for more robust approaches.
Contribution
It provides a comprehensive evaluation of five contamination detection methods on four modern LLMs using difficult datasets, exposing their limitations and inconsistencies.
Findings
Current detection methods have notable limitations.
Detecting contamination during instruction fine-tuning is challenging.
Limited consistency exists among state-of-the-art detection techniques.
Abstract
As large language models achieve increasingly impressive results, questions arise about whether such performance is from generalizability or mere data memorization. Thus, numerous data contamination detection methods have been proposed. However, these approaches are often validated with traditional benchmarks and early-stage LLMs, leaving uncertainty about their effectiveness when evaluating state-of-the-art LLMs on the contamination of more challenging benchmarks. To address this gap and provide a dual investigation of SOTA LLM contamination status and detection method robustness, we evaluate five contamination detection approaches with four state-of-the-art LLMs across eight challenging datasets often used in modern LLM evaluation. Our analysis reveals that (1) Current methods have non-trivial limitations in their assumptions and practical applications; (2) Notable difficulties exist…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital and Cyber Forensics · Data Quality and Management
