The ground truth about metadata and community detection in networks
Leto Peel, Daniel B. Larremore, and Aaron Clauset

TL;DR
This paper critically examines the assumptions behind using metadata as ground truth in community detection, proving fundamental limitations and offering statistical tools to explore their relationship with network structure.
Contribution
It demonstrates that metadata are not equivalent to ground truth, establishes a No Free Lunch theorem for community detection, and introduces methods to quantify metadata-community relationships.
Findings
Metadata are not the same as ground truth.
No algorithm can solve community detection perfectly for all cases.
Introduces statistical techniques to analyze metadata and community structure.
Abstract
Across many scientific domains, there is a common need to automatically extract a simplified view or coarse-graining of how a complex system's components interact. This general task is called community detection in networks and is analogous to searching for clusters in independent vector data. It is common to evaluate the performance of community detection algorithms by their ability to find so-called "ground truth" communities. This works well in synthetic networks with planted communities because such networks' links are formed explicitly based on those known communities. However, there are no planted communities in real world networks. Instead, it is standard practice to treat some observed discrete-valued node attributes, or metadata, as ground truth. Here, we show that metadata are not the same as ground truth, and that treating them as such induces severe theoretical and practical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
