Reliable data clustering with Bayesian community detection
Magnus Neuman, Jelena Smiljani\'c, Martin Rosvall

TL;DR
This paper introduces Bayesian community detection methods based on the Minimum Description Length principle to improve clustering reliability in noisy high-dimensional data, outperforming traditional methods.
Contribution
It demonstrates that Bayesian community detection methods provide a principled, noise-resistant framework for clustering in various scientific fields, unifying sparsification and model selection.
Findings
Outperforms traditional clustering in high-noise synthetic data
Identifies more robust gene modules in genomics data
Provides a unified, principled approach to clustering and sparsification
Abstract
From neuroscience and genomics to systems biology and ecology, researchers rely on clustering similarity data to uncover modular structure. Yet widely used clustering methods, such as hierarchical clustering, k-means, and WGCNA, lack principled model selection, leaving them susceptible to noise. A common workaround sparsifies a correlation matrix representation to remove noise before clustering, but this extra step introduces arbitrary thresholds that can distort the structure and lead to unreliable results. To detect reliable clusters, we capitalize on recent advances in network science to unite sparsification and clustering with principled model selection. We test two Bayesian community detection methods, the Degree-Corrected Stochastic Block Model and the Regularized Map Equation, both grounded in the Minimum Description Length principle for model selection. In synthetic data, they…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks · Gene expression and cancer classification · Bayesian Methods and Mixture Models
