Resampled Mutual Information for Clustering and Community Detection
Cheaheon Lim

TL;DR
This paper presents ResMI, a new information-theoretic measure for clustering similarity that is robust, interpretable, and effective in detecting meaningful community structures, outperforming existing measures especially with high cluster counts.
Contribution
The paper introduces ResMI, a novel clustering similarity measure combining information theory and pair counting, with advantages over existing chance-corrected measures.
Findings
ResMI is robust to biases in high cluster count scenarios.
ResMI accurately detects community structures in real networks.
ResMI is fully interpretable and does not require adjustment terms.
Abstract
We introduce resampled mutual information (ResMI), a novel measure of clustering similarity that combines insights from information theoretic and pair counting approaches to clustering and community detection. Similar to chance-corrected measures, ResMI satisfies the constant baseline property, but it has the advantages of not requiring adjustment terms and being fully interpretable in the language of information theory. Experiments on synthetic datasets demonstrate that ResMI is robust to common biases exhibited by existing measures, particularly in settings with high cluster counts and asymmetric cluster distributions. Additionally, we show that ResMI identifies meaningful community structures in two real contact tracing networks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Web Data Mining and Analysis · Complex Network Analysis Techniques
