Probabilistic Method of Measuring Linguistic Productivity

Sergei Monakhov

arXiv:2308.12643·cs.CL·August 25, 2023·1 cites

Probabilistic Method of Measuring Linguistic Productivity

Sergei Monakhov

PDF

Open Access

TL;DR

This paper introduces a probabilistic method for measuring linguistic productivity that assesses an affix's ability to form new words independently of token frequency, using corpus-based simulation and evaluation on English and Russian data.

Contribution

It proposes a novel, corpus-based probabilistic approach to measure linguistic productivity that accounts for neologisms and is not biased by token frequency.

Findings

01

Productivity correlates with the number of word types.

02

High-frequency items increase first, followed by low-frequency items.

03

The method provides new insights into linguistic productivity dynamics.

Abstract

In this paper I propose a new way of measuring linguistic productivity that objectively assesses the ability of an affix to be used to coin new complex words and, unlike other popular measures, is not directly dependent upon token frequency. Specifically, I suggest that linguistic productivity may be viewed as the probability of an affix to combine with a random base. The advantages of this approach include the following. First, token frequency does not dominate the productivity measure but naturally influences the sampling of bases. Second, we are not just counting attested word types with an affix but rather simulating the construction of these types and then checking whether they are attested in the corpus. Third, a corpus-based approach and randomised design assure that true neologisms and words coined long ago have equal chances to be selected. The proposed algorithm is evaluated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Authorship Attribution and Profiling · Linguistics, Language Diversity, and Identity