DICE: Detecting In-distribution Contamination in LLM's Fine-tuning Phase for Math Reasoning
Shangqing Tu, Kejian Zhu, Yushi Bai, Zijun Yao, Lei Hou, Juanzi Li

TL;DR
This paper introduces DICE, a novel method that uses internal states of large language models to detect in-distribution contamination during fine-tuning, helping ensure accurate evaluation of LLMs on math reasoning benchmarks.
Contribution
DICE is the first approach to identify in-distribution contamination by analyzing internal model states, improving detection accuracy and generalization across multiple benchmarks.
Findings
DICE achieves high accuracy in detecting contamination across various LLMs.
DICE generalizes well to multiple benchmarks with similar distributions.
DICE's predictions correlate strongly with LLM fine-tuning performance.
Abstract
The advancement of large language models (LLMs) relies on evaluation using public benchmarks, but data contamination can lead to overestimated performance. Previous researches focus on detecting contamination by determining whether the model has seen the exact same data during training. Besides, prior work has already shown that even training on data similar to benchmark data inflates performance, namely \emph{In-distribution contamination}. In this work, we argue that in-distribution contamination can lead to the performance drop on OOD benchmarks. To effectively detect in-distribution contamination, we propose DICE, a novel method that leverages the internal states of LLMs to locate-then-detect the contamination. DICE first identifies the most sensitive layer to contamination, then trains a classifier based on the internal states of that layer. Experiments reveal DICE's high accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNumerical Methods and Algorithms · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
MethodsFocus
