Power-law distributions in binned empirical data
Yogesh Virkar, Aaron Clauset

TL;DR
This paper adapts a rigorous statistical framework to test for power-law distributions in binned empirical data, addressing challenges posed by data binning and tail fluctuations, and validates the approach on synthetic and real-world data.
Contribution
It extends the Clauset-Shalizi-Newman power-law testing framework to binned data, enabling more accurate analysis of heavy-tailed phenomena in various fields.
Findings
Effective methods for testing power-laws in binned data.
Quantified the impact of binning on statistical power.
Validated approach on real-world heavy-tailed datasets.
Abstract
Many man-made and natural phenomena, including the intensity of earthquakes, population of cities and size of international wars, are believed to follow power-law distributions. The accurate identification of power-law patterns has significant consequences for correctly understanding and modeling complex systems. However, statistical evidence for or against the power-law hypothesis is complicated by large fluctuations in the empirical distribution's tail, and these are worsened when information is lost from binning the data. We adapt the statistically principled framework for testing the power-law hypothesis, developed by Clauset, Shalizi and Newman, to the case of binned data. This approach includes maximum-likelihood fitting, a hypothesis test based on the Kolmogorov--Smirnov goodness-of-fit statistic and likelihood ratio tests for comparing against alternative explanations. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
