Characterizing Linguistic Shifts in Croatian News via Diachronic Word Embeddings
David Duki\'c, Ana Bari\'c, Marko \v{C}uljak, Josip Juki\'c, Martin Tutek

TL;DR
This paper uses diachronic word embeddings trained on Croatian news articles over 25 years to analyze semantic shifts related to major events and finds increased positivity in post-2020 embeddings, contrasting mental health trends.
Contribution
It introduces a diachronic embedding approach on Croatian news data to quantify semantic changes over 25 years, highlighting shifts related to major societal topics.
Findings
Embeddings capture semantic shifts of major topics like COVID-19 and EU accession.
Post-2020 embeddings show increased positivity in sentiment analysis.
Semantic shifts reflect cultural and societal changes over time.
Abstract
Measuring how semantics of words change over time improves our understanding of how cultures and perspectives change. Diachronic word embeddings help us quantify this shift, although previous studies leveraged substantial temporally annotated corpora. In this work, we use a corpus of 9.5 million Croatian news articles spanning the past 25 years and quantify semantic change using skip-gram word embeddings trained on five-year periods. Our analysis finds that word embeddings capture linguistic shifts of terms pertaining to major topics in this timespan (COVID-19, Croatia joining the European Union, technological advancements). We also find evidence that embeddings from post-2020 encode increased positivity in sentiment analysis tasks, contrasting studies reporting a decline in mental health over the same period.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAuthorship Attribution and Profiling · Natural Language Processing Techniques · Language and cultural evolution
