Are there too many uncited articles? Zero inflated variants of the discretised lognormal and hooked power law distributions
Mike Thelwall

TL;DR
This paper introduces zero inflated variants of the discretised lognormal and hooked power law distributions to better model citation data, accounting for inherently uncitable articles, and demonstrates their improved fit across multiple categories.
Contribution
The paper proposes two new zero inflated models for citation distributions and provides algorithms for fitting them, showing improved data fit over traditional models.
Findings
Zero inflated models improve fit in most categories.
Uncitable articles are linked to specific publication types.
Zero inflated models often outperform standard distributions.
Abstract
Although statistical models fit many citation data sets reasonably well with the best fitting models being the hooked power law and discretised lognormal distribution, the fits are rarely close. One possible reason is that there might be more uncited articles than would be predicted by any model if some articles are inherently uncitable. Using data from 23 different Scopus categories, this article tests the assumption that removing a proportion of uncited articles from a citation dataset allows statistical distributions to have much closer fits. It also introduces two new models, zero inflated discretised lognormal distribution and the zero inflated hooked power law distribution and algorithms to fit them. In all 23 cases, the zero inflated version of the discretised lognormal distribution was an improvement on the standard version and in 15 out of 23 cases the zero inflated version of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
