The LSCD Benchmark: a Testbed for Diachronic Word Meaning Tasks
Dominik Schlechtweg, Sachin Yadav, Nikolay Arefyev

TL;DR
This paper introduces a standardized benchmark repository for Lexical Semantic Change Detection (LSCD), enabling consistent evaluation and comparison of models across different components and tasks, thereby advancing research in diachronic word meaning analysis.
Contribution
It provides a comprehensive, standardized benchmark for LSCD that facilitates reproducibility, modular evaluation, and systematic improvement of models in diachronic semantic change detection.
Findings
Benchmark improves reproducibility of LSCD evaluations
Modular evaluation enables detailed analysis of model components
Systematic experiments advance the state-of-the-art in LSCD
Abstract
Lexical Semantic Change Detection (LSCD) is a complex, lemma-level task, which is usually operationalized based on two subsequently applied usage-level tasks: First, Word-in-Context (WiC) labels are derived for pairs of usages. Then, these labels are represented in a graph on which Word Sense Induction (WSI) is applied to derive sense clusters. Finally, LSCD labels are derived by comparing sense clusters over time. This modularity is reflected in most LSCD datasets and models. It also leads to a large heterogeneity in modeling options and task definitions, which is exacerbated by a variety of dataset versions, preprocessing options and evaluation metrics. This heterogeneity makes it difficult to evaluate models under comparable conditions, to choose optimal model combinations or to reproduce results. Hence, we provide a benchmark repository standardizing LSCD evaluation. Through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
