Co-occurrence of the Benford-like and Zipf Laws Arising from the Texts   Representing Human and Artificial Languages

Evgeny Shulzinger; Irina Legchenkova; Edward Bormashenko

arXiv:1803.03667·cs.CL·March 13, 2018·1 cites

Co-occurrence of the Benford-like and Zipf Laws Arising from the Texts Representing Human and Artificial Languages

Evgeny Shulzinger, Irina Legchenkova, Edward Bormashenko

PDF

Open Access

TL;DR

This study reveals that large texts in human and artificial languages exhibit both Benford-like and Zipf laws, with specific differences in distribution patterns and slopes between language types, highlighting underlying statistical regularities.

Contribution

It demonstrates the co-occurrence of Benford-like and Zipf laws in large texts across human and artificial languages, revealing distinct distribution characteristics.

Findings

01

Zipf law holds with inverse proportionality between rank and frequency.

02

Benford-like distribution of leading numbers is unaffected by removing common words.

03

Artificial languages show larger slopes in distribution plots than human languages.

Abstract

We demonstrate that large texts, representing human (English, Russian, Ukrainian) and artificial (C++, Java) languages, display quantitative patterns characterized by the Benford-like and Zipf laws. The frequency of a word following the Zipf law is inversely proportional to its rank, whereas the total numbers of a certain word appearing in the text generate the uneven Benford-like distribution of leading numbers. Excluding the most popular words essentially improves the correlation of actual textual data with the Zipfian distribution, whereas the Benford distribution of leading numbers (arising from the overall amount of a certain word) is insensitive to the same elimination procedure. The calculated values of the moduli of slopes of double logarithmical plots for artificial languages (C++, Java) are markedly larger than those for human ones.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBenford’s Law and Fraud Detection · Authorship Attribution and Profiling · Complex Systems and Time Series Analysis