Power-law Distributions in Information Science - Making the Case for   Logarithmic Binning

Sta\v{s}a Milojevi\'c

arXiv:1011.1533·physics.soc-ph·April 3, 2012

Power-law Distributions in Information Science - Making the Case for Logarithmic Binning

Sta\v{s}a Milojevi\'c

PDF

TL;DR

This paper advocates for partial logarithmic binning as an effective technique to analyze power-law distributions in information science, revealing hidden trends and improving exponent estimation.

Contribution

It introduces partial logarithmic binning as a preferred method for analyzing noisy power-law tails and discusses its advantages over traditional approaches.

Findings

01

Logarithmic binning uncovers hidden information in noisy data.

02

Least squares fitting on binned data can be effective for exponent estimation.

03

Cumulative distributions may obscure genuine features and complicate analysis.

Abstract

We suggest partial logarithmic binning as the method of choice for uncovering the nature of many distributions encountered in information science (IS). Logarithmic binning retrieves information and trends "not visible" in noisy power-law tails. We also argue that obtaining the exponent from logarithmically binned data using a simple least square method is in some cases warranted in addition to methods such as the maximum likelihood. We also show why often used cumulative distributions can make it difficult to distinguish noise from genuine features, and make it difficult to obtain an accurate power-law exponent of the underlying distribution. The treatment is non-technical, aimed at IS researchers with little or no background in mathematics.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.