Ranking the information content of distance measures

Aldo Glielmo; Claudio Zeni; Bingqing Cheng; Gabor Csanyi; Alessandro; Laio

arXiv:2104.15079·stat.ML·May 27, 2022

Ranking the information content of distance measures

Aldo Glielmo, Claudio Zeni, Bingqing Cheng, Gabor Csanyi, Alessandro, Laio

PDF

1 Repo

TL;DR

This paper introduces a statistical test to compare the information content of different distance measures, helping identify the most informative ones for various scientific applications.

Contribution

A novel statistical test is proposed to evaluate and compare the relative information retained by different distance measures in data analysis.

Findings

01

The test can determine if two distance measures are equivalent, independent, or if one is more informative.

02

Applied to Covid-19 policy variables to identify relevant factors.

03

Used for creating compact, informative representations of atomic structures.

Abstract

Real-world data typically contain a large number of features that are often heterogeneous in nature, relevance, and also units of measure. When assessing the similarity between data points, one can build various distance measures using subsets of these features. Using the fewest features but still retaining sufficient information about the system is crucial in many statistical learning approaches, particularly when data are sparse. We introduce a statistical test that can assess the relative information retained when using two different distance measures, and determine if they are equivalent, independent, or if one is more informative than the other. This in turn allows finding the most informative distance measure out of a pool of candidates. The approach is applied to find the most relevant policy variables for controlling the Covid-19 epidemic and to find compact yet informative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sissa-data-science/dadapy
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.