TL;DR
This paper introduces a statistical test to compare the information content of different distance measures, helping identify the most informative ones for various scientific applications.
Contribution
A novel statistical test is proposed to evaluate and compare the relative information retained by different distance measures in data analysis.
Findings
The test can determine if two distance measures are equivalent, independent, or if one is more informative.
Applied to Covid-19 policy variables to identify relevant factors.
Used for creating compact, informative representations of atomic structures.
Abstract
Real-world data typically contain a large number of features that are often heterogeneous in nature, relevance, and also units of measure. When assessing the similarity between data points, one can build various distance measures using subsets of these features. Using the fewest features but still retaining sufficient information about the system is crucial in many statistical learning approaches, particularly when data are sparse. We introduce a statistical test that can assess the relative information retained when using two different distance measures, and determine if they are equivalent, independent, or if one is more informative than the other. This in turn allows finding the most informative distance measure out of a pool of candidates. The approach is applied to find the most relevant policy variables for controlling the Covid-19 epidemic and to find compact yet informative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
