Mahalanonbis Distance Informed by Clustering
Almog Lahav, Ronen Talmon, Yuval Kluger

TL;DR
This paper introduces a novel Mahalanobis distance that leverages coordinate clustering in high-dimensional data, improving similarity measurement and revealing hidden structures, with applications in gene expression analysis and cancer prognosis.
Contribution
The paper proposes a new Mahalanobis distance based on coordinate clustering, enhancing high-dimensional data analysis and hidden variable recovery.
Findings
Improved estimation of principal directions in synthetic data.
Effective separation of risk groups in lung cancer gene expression data.
Enhanced recovery of Euclidean distances in hidden spaces.
Abstract
A fundamental question in data analysis, machine learning and signal processing is how to compare between data points. The choice of the distance metric is specifically challenging for high-dimensional data sets, where the problem of meaningfulness is more prominent (e.g. the Euclidean distance between images). In this paper, we propose to exploit a property of high-dimensional data that is usually ignored - which is the structure stemming from the relationships between the coordinates. Specifically we show that organizing similar coordinates in clusters can be exploited for the construction of the Mahalanobis distance between samples. When the observable samples are generated by a nonlinear transformation of hidden variables, the Mahalanobis distance allows the recovery of the Euclidean distances in the hidden space.We illustrate the advantage of our approach on a synthetic example…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Face and Expression Recognition · Algorithms and Data Compression
