Estimation of global network statistics from incomplete data
Catherine A. Bliss, Christopher M. Danforth, and Peter Sheridan Dodds

TL;DR
This paper develops scaling methods to accurately estimate global network statistics from incomplete data without assuming a known network model, validated on simulated, empirical, and large-scale social networks like Twitter.
Contribution
It introduces novel, transparent scaling techniques for predicting network statistics from partial data, applicable across various network types without prior knowledge of their generative processes.
Findings
Accurately estimates degree distributions from partial data.
Validates methods on simulated and real-world networks.
Applies techniques to Twitter data, supporting Dunbar's hypothesis.
Abstract
Complex networks underlie an enormous variety of social, biological, physical, and virtual systems. A profound complication for the science of complex networks is that in most cases, observing all nodes and all network interactions is impossible. Previous work addressing the impacts of partial network data is surprisingly limited, focuses primarily on missing nodes, and suggests that network statistics derived from subsampled data are not suitable estimators for the same network statistics describing the overall network topology. We generate scaling methods to predict true network statistics, including the degree distribution, from only partial knowledge of nodes, links, or weights. Our methods are transparent and do not assume a known generating process for the network, thus enabling prediction of network statistics for a wide variety of applications. We validate analytical results on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
