Dependence of exponents on text length versus finite-size scaling for   word-frequency distributions

Alvaro Corral; Francesc Font-Clos

arXiv:1804.03718·physics.data-an·April 12, 2018

Dependence of exponents on text length versus finite-size scaling for word-frequency distributions

Alvaro Corral, Francesc Font-Clos

PDF

TL;DR

This paper provides strong evidence supporting the finite-size scaling law for word-frequency distributions, refuting claims that it is conceptually invalid, and clarifies misconceptions about scaling in linguistic data.

Contribution

It offers a rigorous validation of the finite-size scaling law for word-frequency distributions and clarifies the stability of power-law exponents near 2.

Findings

01

Finite-size scaling law is valid for word-frequency distributions.

02

Power-law exponents are stable and close to 2, consistent with Zipf's law.

03

Refutes the idea that exponents decrease with text length.

Abstract

Some authors have recently argued that a finite-size scaling law for the text-length dependence of word-frequency distributions cannot be conceptually valid. Here we give solid quantitative evidence for the validity of such scaling law, both using careful statistical tests and analytical arguments based on the generalized central-limit theorem applied to the moments of the distribution (and obtaining a novel derivation of Heaps' law as a by-product). We also find that the picture of word-frequency distributions with power-law exponents that decrease with text length [Yan and Minnhagen, Physica A 444, 828 (2016)] does not stand with rigorous statistical analysis. Instead, we show that the distributions are perfectly described by power-law tails with stable exponents, whose values are close to 2, in agreement with the classical Zipf's law. Some misconceptions about scaling are also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.