DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages

Dominik Schlechtweg; Nina Tahmasebi; Simon Hengchen; Haim Dubossarsky,; Barbara McGillivray

arXiv:2104.08540·cs.CL·July 9, 2024

DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages

Dominik Schlechtweg, Nina Tahmasebi, Simon Hengchen, Haim Dubossarsky,, Barbara McGillivray

PDF

1 Repo

TL;DR

This paper introduces DWUG, the largest resource of diachronic word usage graphs across four languages, created through extensive human annotations and clustering methods, enabling better understanding of word meaning changes over time.

Contribution

The paper presents a novel large-scale dataset of diachronic word usage graphs with human annotations, covering four languages, and details the annotation and clustering methodology.

Findings

01

Created a dataset with 100,000 human semantic judgments

02

Developed a clustering algorithm for grouping usages into senses

03

Discussed potential diachronic and synchronic applications

Abstract

Word meaning is notoriously difficult to capture, both synchronically and diachronically. In this paper, we describe the creation of the largest resource of graded contextualized, diachronic word meaning annotation in four different languages, based on 100,000 human semantic proximity judgments. We thoroughly describe the multi-round incremental annotation process, the choice for a clustering algorithm to group usages into senses, and possible - diachronic and synchronic - uses for this dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Garrafao/WUGs
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.