Significance analysis and statistical mechanics: an application to clustering
Marta {\L}uksza, Michael L\"assig, Johannes Berg

TL;DR
This paper introduces a novel approach linking statistical significance in clustering to statistical mechanics, providing analytical solutions and applying it to gene expression data to reveal functional gene relationships.
Contribution
It establishes an analytical framework connecting clustering significance with statistical mechanics, enabling better assessment of cluster relevance in high-dimensional data.
Findings
Analytical solution for cluster p-value in random data
Connection between quenched disorder physics and clustering statistics
Significant gene clusters linked to functional relationships
Abstract
This paper addresses the statistical significance of structures in random data: Given a set of vectors and a measure of mutual similarity, how likely does a subset of these vectors form a cluster with enhanced similarity among its elements? The computation of this cluster p-value for randomly distributed vectors is mapped onto a well-defined problem of statistical mechanics. We solve this problem analytically, establishing a connection between the physics of quenched disorder and multiple testing statistics in clustering and related problems. In an application to gene expression data, we find a remarkable link between the statistical significance of a cluster and the functional relationships between its genes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
