TL;DR
This paper introduces DWUG, the largest resource of diachronic word usage graphs across four languages, created through extensive human annotations and clustering methods, enabling better understanding of word meaning changes over time.
Contribution
The paper presents a novel large-scale dataset of diachronic word usage graphs with human annotations, covering four languages, and details the annotation and clustering methodology.
Findings
Created a dataset with 100,000 human semantic judgments
Developed a clustering algorithm for grouping usages into senses
Discussed potential diachronic and synchronic applications
Abstract
Word meaning is notoriously difficult to capture, both synchronically and diachronically. In this paper, we describe the creation of the largest resource of graded contextualized, diachronic word meaning annotation in four different languages, based on 100,000 human semantic proximity judgments. We thoroughly describe the multi-round incremental annotation process, the choice for a clustering algorithm to group usages into senses, and possible - diachronic and synchronic - uses for this dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
