No Memorization, No Detection: Output Distribution-Based Contamination Detection in Small Language Models
Omer Sela (Tel Aviv University)

TL;DR
This paper evaluates the effectiveness of output distribution-based contamination detection in small language models, revealing its limitations and superiority of probability-based methods like perplexity.
Contribution
It provides a comprehensive analysis of CDD's failure modes and demonstrates that probability-based methods outperform CDD in contamination detection for small models.
Findings
CDD often performs no better than chance in small models.
Perplexity and Min-k% Prob outperform CDD in contamination detection.
Fine-tuning does not reliably produce memorization detectable by CDD.
Abstract
CDD, or Contamination Detection via output Distribution, identifies data contamination by measuring the peakedness of a model's sampled outputs. We study the conditions under which this approach succeeds and fails on small language models ranging from 70M to 410M parameters. Using controlled contamination experiments on GSM8K, HumanEval, and MATH, we find that CDD's effectiveness depends critically on whether fine-tuning produces verbatim memorization. In the majority of conditions we test, CDD performs at chance level even when the data is verifiably contaminated and detectable by simpler methods. We show that probability-based methods, specifically perplexity and Min-k\% Prob, outperform CDD in all conditions where any method exceeds chance, suggesting that CDD's peakedness-based approach is insufficient for contamination detection in small language models. Our code is available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms
