Normalized mutual information is a biased measure for classification and community detection
Maximilian Jerdee, Alec Kirkley, M. E. J. Newman

TL;DR
This paper identifies biases in normalized mutual information used for clustering evaluation, introduces a corrected measure, and demonstrates its impact on algorithm comparison results.
Contribution
The authors propose an unbiased mutual information measure that corrects for biases in traditional normalized mutual information used in clustering and community detection.
Findings
Traditional normalized mutual information is biased and affects algorithm evaluation.
The proposed measure reduces bias and provides more reliable comparisons.
Bias correction significantly alters conclusions about the best algorithms.
Abstract
Normalized mutual information is widely used as a similarity measure for evaluating the performance of clustering and classification algorithms. In this paper, we argue that results returned by the normalized mutual information are biased for two reasons: first, because they ignore the information content of the contingency table and, second, because their symmetric normalization introduces spurious dependence on algorithm output. We introduce a modified version of the mutual information that remedies both of these shortcomings. As a practical demonstration of the importance of using an unbiased measure, we perform extensive numerical tests on a basket of popular algorithms for network community detection and show that one's conclusions about which algorithm is best are significantly affected by the biases in the traditional mutual information.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Advanced Clustering Algorithms Research · Mental Health Research Topics
