LatestEval: Addressing Data Contamination in Language Model Evaluation   through Dynamic and Time-Sensitive Test Construction

Yucheng Li; Frank Guerin; Chenghua Lin

arXiv:2312.12343·cs.CL·March 4, 2024·1 cites

LatestEval: Addressing Data Contamination in Language Model Evaluation through Dynamic and Time-Sensitive Test Construction

Yucheng Li, Frank Guerin, Chenghua Lin

PDF

Open Access 1 Repo

TL;DR

LatestEval introduces a dynamic, time-sensitive evaluation method for language models that minimizes data contamination by using recent texts, leading to more accurate assessments of model capabilities.

Contribution

The paper presents an automated pipeline for constructing uncontaminated, recent-text-based reading comprehension evaluations to improve the robustness of language model assessments.

Findings

01

Models show negligible memorization on LatestEval

02

LatestEval reduces data contamination in evaluations

03

Benchmark results indicate more reliable model performance assessment

Abstract

Data contamination in evaluation is getting increasingly prevalent with the emergence of language models pre-trained on super large, automatically crawled corpora. This problem leads to significant challenges in the accurate assessment of model capabilities and generalisations. In this paper, we propose LatestEval, an automatic method that leverages the most recent texts to create uncontaminated reading comprehension evaluations. LatestEval avoids data contamination by only using texts published within a recent time window, ensuring no overlap with the training corpora of pre-trained language models. We develop the LatestEval automated pipeline to 1) gather the latest texts; 2) identify key information, and 3) construct questions targeting the information while removing the existing answers from the context. This encourages models to infer the answers themselves based on the remaining…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liyucheng09/latesteval
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques