Distributions for cited articles from individual subjects and years
Mike Thelwall, Paul Wilson

TL;DR
This study compares different statistical distributions to model citation counts for academic articles within specific subjects and years, finding that the hooked power law and lognormal distributions are most suitable but have limitations.
Contribution
It evaluates the fit of power law, lognormal, and hooked power law distributions to citation data across multiple disciplines, highlighting the limitations of power law models.
Findings
Power law is unsuitable for citation data from single subjects and years.
Hooked power law and lognormal distributions fit best for some categories.
Parameter estimates for these models are often unreliable.
Abstract
The citations to a set of academic articles are typically unevenly shared, with many articles attracting few citations and few attracting many. It is important to know more precisely how citations are distributed in order to help statistical analyses of citations, especially for sets of articles from a single discipline and a small range of years, as normally used for research evaluation. This article fits discrete versions of the power law, the lognormal distribution and the hooked power law to 20 different Scopus categories, using citations to articles published in 2004 and ignoring uncited articles. The results show that, despite its popularity, the power law is not a suitable model for collections of articles from a single subject and year, even for the purpose of estimating the slope of the tail of the citation data. Both the hooked power law and the lognormal distributions fit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
