Semantic Change Detection for the Romanian Language
Ciprian-Octavian Truic\u{a}, Victor Tudose, Elena-Simona Apostol

TL;DR
This paper compares static and contextual word embedding models, Word2Vec and ELMo, for detecting semantic change over time in English and Romanian, highlighting factors influencing model performance.
Contribution
It introduces a pipeline for semantic change detection in low-resource languages like Romanian using different embedding strategies and evaluates their effectiveness.
Findings
Model choice significantly impacts semantic change detection accuracy.
Corpus characteristics influence the effectiveness of embedding models.
Contextual embeddings like ELMo can capture nuanced semantic shifts.
Abstract
Automatic semantic change methods try to identify the changes that appear over time in the meaning of words by analyzing their usage in diachronic corpora. In this paper, we analyze different strategies to create static and contextual word embedding models, i.e., Word2Vec and ELMo, on real-world English and Romanian datasets. To test our pipeline and determine the performance of our models, we first evaluate both word embedding models on an English dataset (SEMEVAL-CCOHA). Afterward, we focus our experiments on a Romanian dataset, and we underline different aspects of semantic changes in this low-resource language, such as meaning acquisition and loss. The experimental results show that, depending on the corpus, the most important factors to consider are the choice of model and the distance to calculate a score for detecting semantic change.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Authorship Attribution and Profiling
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Softmax · Bidirectional LSTM · ELMo · Focus
