Zipf's law and log-normal distributions in measures of scientific output across fields and institutions: 40 years of Slovenia's research as an example
Matjaz Perc

TL;DR
This study analyzes 40 years of Slovenian research data, revealing Zipfian distributions in citation counts of individual publications and log-normal distributions in researcher success metrics, highlighting the statistical properties of scientific output.
Contribution
It demonstrates that publication citations follow Zipf's law while researcher success indices fit a log-normal distribution, providing new insights into the statistical nature of scientific impact measures.
Findings
Citation distributions follow Zipf's law with exponents between 2.4 and 3.1.
Researcher success indices like h-index and g-index are best modeled by log-normal distributions.
Exponential distributions appear in hierarchical institutions with high self-citation rates.
Abstract
Slovenia's Current Research Information System (SICRIS) currently hosts 86,443 publications with citation data from 8,359 researchers working on the whole plethora of social and natural sciences from 1970 till present. Using these data, we show that the citation distributions derived from individual publications have Zipfian properties in that they can be fitted by a power law , with between 2.4 and 3.1 depending on the institution and field of research. Distributions of indexes that quantify the success of researchers rather than individual publications, on the other hand, cannot be associated with a power law. We find that for Egghe's g-index and Hirsch's h-index the log-normal form applies best, with and depending moderately on the underlying set of researchers. In special cases, particularly for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
