Effects of missing data in social networks
Gueorgi Kossinets

TL;DR
This study investigates how different types of missing data affect the analysis of social network structures, revealing significant biases and surprising properties in network metrics due to data omissions.
Contribution
It provides a comprehensive sensitivity analysis of missing data mechanisms in social networks, highlighting their impact on network statistics and robustness.
Findings
Boundary specification and fixed choice designs can significantly bias network statistics.
Actor non-response leads to underestimation of clustering and assortativity.
Networks with multiple interaction contexts exhibit unique properties and resilience patterns.
Abstract
We perform sensitivity analyses to assess the impact of missing data on the structural properties of social networks. The social network is conceived of as being generated by a bipartite graph, in which actors are linked together via multiple interaction contexts or affiliations. We discuss three principal missing data mechanisms: network boundary specification (non-inclusion of actors or affiliations), survey non-response, and censoring by vertex degree (fixed choice design), examining their impact on the scientific collaboration network from the Los Alamos E-print Archive as well as random bipartite graphs. The results show that network boundary specification and fixed choice designs can dramatically alter estimates of network-level statistics. The observed clustering and assortativity coefficients are overestimated via omission of interaction contexts (affiliations) or fixed choice…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Social Capital and Networks · Opinion Dynamics and Social Influence
