The discretised lognormal and hooked power law distributions for complete citation data: Best options for modelling and regression
Mike Thelwall

TL;DR
This study compares the discretised lognormal and hooked power law distributions for modeling citation data across various disciplines, recommending the latter for older articles and OLS regression on log-transformed data for analysis.
Contribution
It provides a comprehensive comparison of the two distributions across all articles in multiple disciplines, including uncited articles, and offers practical regression recommendations.
Findings
Hooked power law fits better for older articles in most sciences.
Discretised lognormal fits better for arts and social sciences.
OLS regression on log-transformed citation counts is recommended for younger articles.
Abstract
Identifying the statistical distribution that best fits citation data is important to allow robust and powerful quantitative analyses. Whilst previous studies have suggested that both the hooked power law and discretised lognormal distributions fit better than the power law and negative binomial distributions, no comparisons so far have covered all articles within a discipline, including those that are uncited. Based on an analysis of 26 different Scopus subject areas in seven different years, this article reports comparisons of the discretised lognormal and the hooked power law with citation data, adding 1 to citation counts in order to include zeros. The hooked power law fits better in two thirds of the subject/year combinations tested for journal articles that are at least three years old, including most medical, life and natural sciences, and for virtually all subject areas for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
