Resampled Mutual Information for Clustering and Community Detection

Cheaheon Lim

arXiv:2412.03584·cs.SI·December 6, 2024

Resampled Mutual Information for Clustering and Community Detection

Cheaheon Lim

PDF

Open Access

TL;DR

This paper presents ResMI, a new information-theoretic measure for clustering similarity that is robust, interpretable, and effective in detecting meaningful community structures, outperforming existing measures especially with high cluster counts.

Contribution

The paper introduces ResMI, a novel clustering similarity measure combining information theory and pair counting, with advantages over existing chance-corrected measures.

Findings

01

ResMI is robust to biases in high cluster count scenarios.

02

ResMI accurately detects community structures in real networks.

03

ResMI is fully interpretable and does not require adjustment terms.

Abstract

We introduce resampled mutual information (ResMI), a novel measure of clustering similarity that combines insights from information theoretic and pair counting approaches to clustering and community detection. Similar to chance-corrected measures, ResMI satisfies the constant baseline property, but it has the advantages of not requiring adjustment terms and being fully interpretable in the language of information theory. Experiments on synthetic datasets demonstrate that ResMI is robust to common biases exhibited by existing measures, particularly in settings with high cluster counts and asymmetric cluster distributions. Additionally, we show that ResMI identifies meaningful community structures in two real contact tracing networks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Web Data Mining and Analysis · Complex Network Analysis Techniques