DICE: Detecting In-distribution Contamination in LLM's Fine-tuning Phase   for Math Reasoning

Shangqing Tu; Kejian Zhu; Yushi Bai; Zijun Yao; Lei Hou; Juanzi Li

arXiv:2406.04197·cs.CL·September 24, 2024

DICE: Detecting In-distribution Contamination in LLM's Fine-tuning Phase for Math Reasoning

Shangqing Tu, Kejian Zhu, Yushi Bai, Zijun Yao, Lei Hou, Juanzi Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces DICE, a novel method that uses internal states of large language models to detect in-distribution contamination during fine-tuning, helping ensure accurate evaluation of LLMs on math reasoning benchmarks.

Contribution

DICE is the first approach to identify in-distribution contamination by analyzing internal model states, improving detection accuracy and generalization across multiple benchmarks.

Findings

01

DICE achieves high accuracy in detecting contamination across various LLMs.

02

DICE generalizes well to multiple benchmarks with similar distributions.

03

DICE's predictions correlate strongly with LLM fine-tuning performance.

Abstract

The advancement of large language models (LLMs) relies on evaluation using public benchmarks, but data contamination can lead to overestimated performance. Previous researches focus on detecting contamination by determining whether the model has seen the exact same data during training. Besides, prior work has already shown that even training on data similar to benchmark data inflates performance, namely \emph{In-distribution contamination}. In this work, we argue that in-distribution contamination can lead to the performance drop on OOD benchmarks. To effectively detect in-distribution contamination, we propose DICE, a novel method that leverages the internal states of LLMs to locate-then-detect the contamination. DICE first identifies the most sensitive layer to contamination, then trains a classifier based on the internal states of that layer. Experiments reveal DICE's high accuracy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thu-keg/dice
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNumerical Methods and Algorithms · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification

MethodsFocus