Measuring What Matters: Intrinsic Distance Preservation as a Robust Metric for Embedding Quality
Steven N. Hart, Thomas E. Tavolara

TL;DR
This paper introduces IDPE, a new intrinsic evaluation method for embeddings based on Mahalanobis distance preservation, offering a more reliable and task-independent assessment of embedding quality compared to traditional extrinsic metrics.
Contribution
The paper presents IDPE, a novel intrinsic evaluation metric that effectively measures the preservation of data structure in embeddings, addressing limitations of existing methods.
Findings
IDPE correlates well with embedding quality across datasets.
Traditional metrics can be misleading about true embedding quality.
IDPE provides new insights into PCA and t-SNE embeddings.
Abstract
Unsupervised embeddings are fundamental to numerous machine learning applications, yet their evaluation remains a challenging task. Traditional assessment methods often rely on extrinsic variables, such as performance in downstream tasks, which can introduce confounding factors and mask the true quality of embeddings. This paper introduces the Intrinsic Distance Preservation Evaluation (IDPE) method, a novel approach for assessing embedding quality based on the preservation of Mahalanobis distances between data points in the original and embedded spaces. We demonstrate the limitations of extrinsic evaluation methods through a simple example, highlighting how they can lead to misleading conclusions about embedding quality. IDPE addresses these issues by providing a task-independent measure of how well embeddings preserve the intrinsic structure of the original data. Our method leverages…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Data Management and Algorithms · Advanced Clustering Algorithms Research
MethodsFocus · Principal Components Analysis
