# Metrics matter in community detection

**Authors:** Arya D. McCarthy, Tongfei Chen, Rachel Rudinger, David W., Matula

arXiv: 1901.01354 · 2020-05-22

## TL;DR

This paper critically examines the use of normalized mutual information (NMI) for evaluating community detection, highlighting its biases and proposing more robust alternatives like one-sided AMI for fair assessment.

## Contribution

It analyzes the limitations of NMI and related metrics, providing equivalences under random models and recommending improved evaluation methods for community detection.

## Key findings

- NMI exaggerates performance on weak communities
- One-sided AMI offers a more robust evaluation metric
- Different metrics can be equivalent under certain models

## Abstract

We present a critical evaluation of normalized mutual information (NMI) as an evaluation metric for community detection. NMI exaggerates the leximin method's performance on weak communities: Does leximin, in finding the trivial singletons clustering, truly outperform eight other community detection methods? Three NMI improvements from the literature are AMI, rrNMI, and cNMI. We show equivalences under relevant random models, and for evaluating community detection, we advise one-sided AMI under the $\mathbb{M}_{\mathrm{all}}$ model (all partitions of $n$ nodes). This work seeks (1) to start a conversation on robust measurements, and (2) to advocate evaluations which do not give "free lunch".

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1901.01354/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/1901.01354/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/1901.01354/full.md

---
Source: https://tomesphere.com/paper/1901.01354