Power-law Distributions in Information Science - Making the Case for Logarithmic Binning
Sta\v{s}a Milojevi\'c

TL;DR
This paper advocates for partial logarithmic binning as an effective technique to analyze power-law distributions in information science, revealing hidden trends and improving exponent estimation.
Contribution
It introduces partial logarithmic binning as a preferred method for analyzing noisy power-law tails and discusses its advantages over traditional approaches.
Findings
Logarithmic binning uncovers hidden information in noisy data.
Least squares fitting on binned data can be effective for exponent estimation.
Cumulative distributions may obscure genuine features and complicate analysis.
Abstract
We suggest partial logarithmic binning as the method of choice for uncovering the nature of many distributions encountered in information science (IS). Logarithmic binning retrieves information and trends "not visible" in noisy power-law tails. We also argue that obtaining the exponent from logarithmically binned data using a simple least square method is in some cases warranted in addition to methods such as the maximum likelihood. We also show why often used cumulative distributions can make it difficult to distinguish noise from genuine features, and make it difficult to obtain an accurate power-law exponent of the underlying distribution. The treatment is non-technical, aimed at IS researchers with little or no background in mathematics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
