Statistically significant detection of semantic shifts using contextual   word embeddings

Yang Liu; Alan Medlar; Dorota Glowacka

arXiv:2104.03776·cs.CL·February 23, 2022

Statistically significant detection of semantic shifts using contextual word embeddings

Yang Liu, Alan Medlar, Dorota Glowacka

PDF

Open Access

TL;DR

This paper introduces a statistically rigorous method for detecting semantic shifts in words over time using contextual embeddings and permutation tests, improving robustness especially in small datasets.

Contribution

It combines contextual word embeddings with permutation-based statistical tests and false discovery rate correction to reliably identify semantic change.

Findings

01

High precision in simulation tests by reducing false positives

02

Improved robustness of semantic shift estimates in real-world data

03

Effective detection of semantic change in small datasets

Abstract

Detecting lexical semantic change in smaller data sets, e.g. in historical linguistics and digital humanities, is challenging due to a lack of statistical power. This issue is exacerbated by non-contextual embedding models that produce one embedding per word and, therefore, mask the variability present in the data. In this article, we propose an approach to estimate semantic shift by combining contextual word embeddings with permutation-based statistical tests. We use the false discovery rate procedure to address the large number of hypothesis tests being conducted simultaneously. We demonstrate the performance of this approach in simulation where it achieves consistently high precision by suppressing false positives. We additionally analyze real-world data from SemEval-2020 Task 1 and the Liverpool FC subreddit corpus. We show that by taking sample variation into account, we can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage and cultural evolution · Complex Network Analysis Techniques · Advanced Text Analysis Techniques