LastingBench: Defend Benchmarks Against Knowledge Leakage

Yixiong Fang; Tianran Sun; Yuling Shi; Min Wang; Xiaodong Gu

arXiv:2506.21614·cs.CL·September 16, 2025

LastingBench: Defend Benchmarks Against Knowledge Leakage

Yixiong Fang, Tianran Sun, Yuling Shi, Min Wang, Xiaodong Gu

PDF

Open Access 1 Video

TL;DR

LastingBench is a framework that enhances the robustness of QA benchmarks by identifying and rewriting leakage points to prevent models from memorizing answers, thus ensuring fairer evaluations of large language models.

Contribution

It introduces a novel method to detect and mitigate knowledge leakage in benchmarks, maintaining their long-term utility and fairness.

Findings

01

Significant reduction in memorization effects on QA benchmarks.

02

Improved fairness and interpretability of model evaluations.

03

Demonstrated scalability and practicality of the approach.

Abstract

The increasing complexity of large language models (LLMs) raises concerns about their ability to "cheat" on standard Question Answering (QA) benchmarks by memorizing task-specific data. This undermines the validity of benchmark evaluations, as they no longer reflect genuine model capabilities but instead the effects of data leakage. While prior work has focused on detecting such leakage, little attention has been given to mitigating its impact and preserving the long-term utility of benchmarks. In this paper, we introduce LastingBench, a novel framework designed to continuously reinforce and safeguard existing benchmarks against knowledge leakage. LastingBench identifies leakage points in the context through perturbation, then rewrites the leakage points to counterfactual ones-disrupting memorization while preserving the benchmark's original evaluative intent. Evaluations of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

LastingBench: Defend Benchmarks Against Knowledge Leakage· underline

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks