NorDiaChange: Diachronic Semantic Change Dataset for Norwegian
Andrey Kutuzov, Samia Touileb, Petter M{\ae}hlum, Tita Ranveig Enstad,, Alexandra Wittemann

TL;DR
NorDiaChange is a pioneering dataset capturing semantic changes in Norwegian nouns over time, facilitating research in diachronic linguistics and NLP with annotated data covering key historical periods.
Contribution
It introduces the first diachronic semantic change dataset for Norwegian, with two annotated subsets using the DURel framework and historical corpora.
Findings
Two annotated subsets for training and testing
Coverage of key historical periods in Norway
Open access to raw annotation data and usage graphs
Abstract
We describe NorDiaChange: the first diachronic semantic change dataset for Norwegian. NorDiaChange comprises two novel subsets, covering about 80 Norwegian nouns manually annotated with graded semantic change over time. Both datasets follow the same annotation procedure and can be used interchangeably as train and test splits for each other. NorDiaChange covers the time periods related to pre- and post-war events, oil and gas discovery in Norway, and technological developments. The annotation was done using the DURel framework and two large historical Norwegian corpora. NorDiaChange is published in full under a permissive licence, complete with raw annotation data and inferred diachronic word usage graphs (DWUGs).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Language and cultural evolution
