Evaluating the Evaluator: Problems with SemEval-2020 Task 1 for Lexical Semantic Change Detection
Bach Phan-Tat, Kris Heylen, Dirk Geeraerts, Stefano De Pascale, Dirk Speelmana

TL;DR
This paper critically evaluates SemEval-2020 Task 1 for lexical semantic change detection, highlighting operational, data quality, and design issues that limit its effectiveness and suggesting improvements for future benchmarks.
Contribution
It provides a comprehensive critique of the benchmark's operationalization, data quality, and design, proposing broader theories and better practices for future research.
Findings
The benchmark models change as sense gain, loss, or redistribution, which is too narrow.
Data issues like OCR noise and tagging errors affect model performance and reproducibility.
Limited target sets and language coverage reduce the benchmark's realism and statistical reliability.
Abstract
This discussion paper re-examines SemEval-2020 Task 1, the most influential shared benchmark for lexical semantic change detection, through a three-part evaluative framework: operationalisation, data quality, and benchmark design. First, at the level of operationalisation, we argue that the benchmark models semantic change mainly as gain, loss, or redistribution of discrete senses. While practical for annotation and evaluation, this framing is too narrow to capture gradual, constructional, collocational, and discourse-level change. Also, the gold labels are outcomes of annotation decisions, clustering procedures, and threshold settings, which could potentially limit the validity of the task. Second, at the level of data quality, we show that the benchmark is affected by substantial corpus and preprocessing problems, including OCR noise, malformed characters, truncated sentences,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
