Accurate estimation of the normalized mutual information of   multidimensional data

Daniel Nagel; Georg Diez; Gerhard Stock

arXiv:2405.04980·physics.data-an·September 19, 2024

Accurate estimation of the normalized mutual information of multidimensional data

Daniel Nagel, Georg Diez, Gerhard Stock

PDF

2 Repos

TL;DR

This paper introduces a new method to accurately estimate and normalize mutual information for multidimensional data, overcoming limitations of existing approaches and enabling better interpretation of correlations in complex datasets.

Contribution

A novel entropy-based normalization technique for mutual information that is invariant under variable transformations and compatible with k-nearest neighbor estimators.

Findings

01

Validated on toy models demonstrating accuracy

02

Applied to T4 lysozyme data showing effective correlation measurement

03

Provides a numerically efficient algorithm for normalized MI estimation

Abstract

While the linear Pearson correlation coefficient represents a well-established normalized measure to quantify the interrelation of two stochastic variables $X$ and $Y$ , it fails for multidimensional variables such as Cartesian coordinates. Avoiding any assumption about the underlying data, the mutual information $I (X, Y)$ does account for multidimensional correlations. However, unlike the normalized Pearson correlation, it has no upper bound ( $I \in [0, \infty)$ ), i.e., it is not clear if say, $I = 0.4$ corresponds to a low or a high correlation. Moreover, the mutual information (MI) involves the estimation of high-dimensional probability densities (e.g., six-dimensional for Cartesian coordinates), which requires a k-nearest neighbor algorithm, such as the estimator by Kraskov et al. [Phys. Rev. E 69, 066138 (2004)]. As existing methods to normalize the MI cannot be used in connection…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.