Sampling bias in systems with structural heterogeneity and limited internal diffusion
Jukka-Pekka Onnela, Neil F. Johnson, Sean Gourley, Gesine Reinert, and, Michael Spagat

TL;DR
This paper investigates how structural heterogeneity and limited diffusion in complex systems can cause sampling bias, providing a framework to quantify and correct this bias, with implications for social and biological data analysis.
Contribution
It introduces a general framework to quantify sampling bias due to heterogeneity and diffusion limits, including an explicit correction factor applicable to various systems.
Findings
Sampling bias can significantly distort global system characteristics.
Application to conflict mortality data in Iraq shows overestimation of deaths.
The framework enables more accurate inference from heterogeneous, diffusion-limited data.
Abstract
Complex systems research is becomingly increasingly data-driven, particularly in the social and biological domains. Many of the systems from which sample data are collected feature structural heterogeneity at the mesoscopic scale (i.e. communities) and limited inter-community diffusion. Here we show that the interplay between these two features can yield a significant bias in the global characteristics inferred from the data. We present a general framework to quantify this bias, and derive an explicit corrective factor for a wide class of systems. Applying our analysis to a recent high-profile survey of conflict mortality in Iraq suggests a significant overestimate of deaths.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
