Significance analysis and statistical mechanics: an application to   clustering

Marta {\L}uksza; Michael L\"assig; Johannes Berg

arXiv:1009.2470·q-bio.MN·May 19, 2015

Significance analysis and statistical mechanics: an application to clustering

Marta {\L}uksza, Michael L\"assig, Johannes Berg

PDF

TL;DR

This paper introduces a novel approach linking statistical significance in clustering to statistical mechanics, providing analytical solutions and applying it to gene expression data to reveal functional gene relationships.

Contribution

It establishes an analytical framework connecting clustering significance with statistical mechanics, enabling better assessment of cluster relevance in high-dimensional data.

Findings

01

Analytical solution for cluster p-value in random data

02

Connection between quenched disorder physics and clustering statistics

03

Significant gene clusters linked to functional relationships

Abstract

This paper addresses the statistical significance of structures in random data: Given a set of vectors and a measure of mutual similarity, how likely does a subset of these vectors form a cluster with enhanced similarity among its elements? The computation of this cluster p-value for randomly distributed vectors is mapped onto a well-defined problem of statistical mechanics. We solve this problem analytically, establishing a connection between the physics of quenched disorder and multiple testing statistics in clustering and related problems. In an application to gene expression data, we find a remarkable link between the statistical significance of a cluster and the functional relationships between its genes.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.