Estimating group properties in online social networks with a classifier
George Berry, Antonio Sirianni, Nathan High, Agrippa Kellum, Ingmar, Weber, Michael Macy

TL;DR
This paper introduces AdjustedWalk, a three-step method combining graph walking, classifier training, and bias correction to accurately estimate group properties in social networks despite incomplete data and classifier bias.
Contribution
The paper presents a novel three-step procedure, AdjustedWalk, that corrects classifier bias in social network property estimation while accounting for network crawling constraints.
Findings
AdjustedWalk provides unbiased estimates in various social network tasks.
The method performs well compared to baselines in simulated and empirical graphs.
Variance increases with low-recall classifiers, indicating a trade-off.
Abstract
We consider the problem of obtaining unbiased estimates of group properties in social networks when using a classifier for node labels. Inference for this problem is complicated by two factors: the network is not known and must be crawled, and even high-performance classifiers provide biased estimates of group proportions. We propose and evaluate AdjustedWalk for addressing this problem. This is a three step procedure which entails: 1) walking the graph starting from an arbitrary node; 2) learning a classifier on the nodes in the walk; and 3) applying a post-hoc adjustment to classification labels. The walk step provides the information necessary to make inferences over the nodes and edges, while the adjustment step corrects for classifier bias in estimating group proportions. This process provides de-biased estimates at the cost of additional variance. We evaluate AdjustedWalk on four…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Advanced Graph Neural Networks · Data-Driven Disease Surveillance
