TL;DR
This paper introduces a Bayesian approach to hierarchical clustering that corrects for variable dimensionality and automates stopping, outperforming traditional mutual information methods in accuracy and consistency.
Contribution
The authors develop a Bayesian model comparison framework for clustering dependent variables, providing natural regularization, an automated stopping rule, and dimensionality correction.
Findings
Bayesian clustering outperforms traditional methods in classification accuracy.
The approach provides an automated thresholding mechanism.
It yields consistent clusters in fMRI data, aligning with established results.
Abstract
The use of mutual information as a similarity measure in agglomerative hierarchical clustering (AHC) raises an important issue: some correction needs to be applied for the dimensionality of variables. In this work, we formulate the decision of merging dependent multivariate normal variables in an AHC procedure as a Bayesian model comparison. We found that the Bayesian formulation naturally shrinks the empirical covariance matrix towards a matrix set a priori (e.g., the identity), provides an automated stopping rule, and corrects for dimensionality using a term that scales up the measure as a function of the dimensionality of the variables. Also, the resulting log Bayes factor is asymptotically proportional to the plug-in estimate of mutual information, with an additive correction for dimensionality in agreement with the Bayesian information criterion. We investigated the behavior of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
