Estimating Clique Composition and Size Distributions from Sampled Network Data
Minas Gjoka, Emily Smith, Carter T. Butts

TL;DR
This paper presents methods to accurately estimate the distribution of clique sizes and compositions in large networks from sampled data, with applications to social graphs like Facebook.
Contribution
It introduces two unbiased estimators for clique size distribution from sampled network data, including one that uses neighbor labeling information.
Findings
The estimators perform well on real-world graphs.
They can estimate clique composition by attributes such as gender.
Application to Facebook data demonstrates practical utility.
Abstract
Cliques are defined as complete graphs or subgraphs; they are the strongest form of cohesive subgroup, and are of interest in both social science and engineering contexts. In this paper we show how to efficiently estimate the distribution of clique sizes from a probability sample of nodes obtained from a graph (e.g., by independence or link-trace sampling). We introduce two types of unbiased estimators, one of which exploits labeling of sampled nodes neighbors and one of which does not require this information. We compare the estimators on a variety of real-world graphs and provide suggestions for their use. We generalize our estimators to cases in which cliques are distinguished not only by size but also by node attributes, allowing us to estimate clique composition by size. Finally, we apply our methodology to a sample of Facebook users to estimate the clique size distribution by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
