Universal deterministic patterns in stochastic count data
Zhixing Cao, Yiling Wang, Ramon Grima

TL;DR
This paper uncovers universal deterministic patterns in the mean-Fano factor relationship across diverse stochastic count datasets, supported by a theoretical model linking these patterns to sample size.
Contribution
It introduces a theory explaining the emergence of these patterns from discrete distributions organized in matrix form, highlighting their dependence on sample size.
Findings
Patterns are observed across genomics, citations, ecology, and more.
Theoretical model accurately predicts the patterns.
Patterns depend solely on sample size.
Abstract
We report the existence of deterministic patterns in plots showing the relationship between the mean and the Fano factor (ratio of variance and mean) of stochastic count data. These patterns are found in a wide variety of datasets, including those from genomics, paper citations, commerce, ecology, disease outbreaks, and employment statistics. We develop a theory showing that the patterns naturally emerge when data sampled from discrete probability distributions is organised in matrix form. The theory precisely predicts the patterns and shows that they are a function of only one variable - the sample size.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models
