The ground truth about metadata and community detection in networks

Leto Peel; Daniel B. Larremore; and Aaron Clauset

arXiv:1608.05878·cs.SI·September 28, 2017

The ground truth about metadata and community detection in networks

Leto Peel, Daniel B. Larremore, and Aaron Clauset

PDF

TL;DR

This paper critically examines the assumptions behind using metadata as ground truth in community detection, proving fundamental limitations and offering statistical tools to explore their relationship with network structure.

Contribution

It demonstrates that metadata are not equivalent to ground truth, establishes a No Free Lunch theorem for community detection, and introduces methods to quantify metadata-community relationships.

Findings

01

Metadata are not the same as ground truth.

02

No algorithm can solve community detection perfectly for all cases.

03

Introduces statistical techniques to analyze metadata and community structure.

Abstract

Across many scientific domains, there is a common need to automatically extract a simplified view or coarse-graining of how a complex system's components interact. This general task is called community detection in networks and is analogous to searching for clusters in independent vector data. It is common to evaluate the performance of community detection algorithms by their ability to find so-called "ground truth" communities. This works well in synthetic networks with planted communities because such networks' links are formed explicitly based on those known communities. However, there are no planted communities in real world networks. Instead, it is standard practice to treat some observed discrete-valued node attributes, or metadata, as ground truth. Here, we show that metadata are not the same as ground truth, and that treating them as such induces severe theoretical and practical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.