Evaluating the Evaluator: Problems with SemEval-2020 Task 1 for Lexical Semantic Change Detection

Bach Phan-Tat; Kris Heylen; Dirk Geeraerts; Stefano De Pascale; Dirk Speelmana

arXiv:2604.13232·cs.CL·April 16, 2026

Evaluating the Evaluator: Problems with SemEval-2020 Task 1 for Lexical Semantic Change Detection

Bach Phan-Tat, Kris Heylen, Dirk Geeraerts, Stefano De Pascale, Dirk Speelmana

PDF

TL;DR

This paper critically evaluates SemEval-2020 Task 1 for lexical semantic change detection, highlighting operational, data quality, and design issues that limit its effectiveness and suggesting improvements for future benchmarks.

Contribution

It provides a comprehensive critique of the benchmark's operationalization, data quality, and design, proposing broader theories and better practices for future research.

Findings

01

The benchmark models change as sense gain, loss, or redistribution, which is too narrow.

02

Data issues like OCR noise and tagging errors affect model performance and reproducibility.

03

Limited target sets and language coverage reduce the benchmark's realism and statistical reliability.

Abstract

This discussion paper re-examines SemEval-2020 Task 1, the most influential shared benchmark for lexical semantic change detection, through a three-part evaluative framework: operationalisation, data quality, and benchmark design. First, at the level of operationalisation, we argue that the benchmark models semantic change mainly as gain, loss, or redistribution of discrete senses. While practical for annotation and evaluation, this framing is too narrow to capture gradual, constructional, collocational, and discourse-level change. Also, the gold labels are outcomes of annotation decisions, clustering procedures, and threshold settings, which could potentially limit the validity of the task. Second, at the level of data quality, we show that the benchmark is affected by substantial corpus and preprocessing problems, including OCR noise, malformed characters, truncated sentences,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.