Are the discretised lognormal and hooked power law distributions plausible for citation data?
Mike Thelwall

TL;DR
This study evaluates whether discretised lognormal and hooked power law distributions are suitable for modeling citation data, finding they often do not fit well across various subcategories, raising questions about their appropriateness.
Contribution
It provides an empirical assessment of the plausibility of two popular distributions for citation counts across multiple subcategories, highlighting their limitations.
Findings
Both distributions failed goodness of fit tests in over 75% of cases.
Discretised lognormal distribution shows shape mismatches, especially in zeros and medium values.
Impurities and interdisciplinary research may affect distribution fitting.
Abstract
There is no agreement over which statistical distribution is most appropriate for modelling citation count data. This is important because if one distribution is accepted then the relative merits of different citation-based indicators, such as percentiles, arithmetic means and geometric means, can be more fully assessed. In response, this article investigates the plausibility of the discretised lognormal and hooked power law distributions for modelling the full range of citation counts, with an offset of 1. The citation counts from 23 Scopus subcategories were fitted to hooked power law and discretised lognormal distributions but both distributions failed a Kolmogorov-Smirnov goodness of fit test in over three quarters of cases. The discretised lognormal distribution also seems to have the wrong shape for citation distributions, with too few zeros and not enough medium values for all…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicsscientometrics and bibliometrics research · Data Analysis with R · Meta-analysis and systematic reviews
