When Benchmarks Leak: Inference-Time Decontamination for LLMs
Jianzhe Chai, Yu Zhe, Jun Sakuma

TL;DR
This paper introduces DeconIEP, a novel evaluation-time decontamination method for LLMs that applies input perturbations guided by a reference model to mitigate test set contamination effects.
Contribution
DeconIEP is a new framework that effectively reduces benchmark test contamination during evaluation without significantly harming model performance.
Findings
DeconIEP significantly reduces performance inflation caused by data leakage.
The method maintains high utility on clean inputs with minimal degradation.
Effective across multiple LLMs and benchmark datasets.
Abstract
Benchmark-based evaluation is the de facto standard for comparing large language models (LLMs). However, its reliability is increasingly threatened by test set contamination, where test samples or their close variants leak into training data and artificially inflate reported performance. To address this issue, prior work has explored two main lines of mitigation. One line attempts to identify and remove contaminated benchmark items before evaluation, but this inevitably alters the evaluation set itself and becomes unreliable when contamination is moderate or severe. The other line preserves the benchmark and instead suppresses contaminated behavior at evaluation time; however, such interventions often interfere with normal inference and lead to noticeable performance degradation on clean inputs. We propose DeconIEP, a decontamination framework that operates entirely during evaluation by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
