Opening the black-box of Neighbor Embedding with Hotelling's T2 statistic and Q-residuals
Roman Josef Rainer, Michael Mayr, Johannes Himmelbauer, Ramin, Nikzad-Langerodi

TL;DR
This paper introduces a novel interpretability method for neighbor embedding techniques like t-SNE and UMAP, using PCA-based statistics to identify features responsible for local and global data structures.
Contribution
It proposes a new approach combining PCA, Q-residuals, and Hotelling's T2 to explain the features underlying neighbor embedding results, enhancing interpretability.
Findings
Identifies discriminatory features between data groups.
Enhances understanding of local and global data structures.
Provides visualization tools for interpretation.
Abstract
In contrast to classical techniques for exploratory analysis of high-dimensional data sets, such as principal component analysis (PCA), neighbor embedding (NE) techniques tend to better preserve the local structure/topology of high-dimensional data. However, the ability to preserve local structure comes at the expense of interpretability: Techniques such as t-Distributed Stochastic Neighbor Embedding (t-SNE) or Uniform Manifold Approximation and Projection (UMAP) do not give insights into which input variables underlie the topological (cluster) structure seen in the corresponding embedding. We here propose different "tricks" from the chemometrics field based on PCA, Q-residuals and Hotelling's T2 contributions in combination with novel visualization approaches to derive local and global explanations of neighbor embedding. We show how our approach is capable of identifying discriminatory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Sensory Analysis and Statistical Methods · Metabolomics and Mass Spectrometry Studies
MethodsParametric UMAP · Principal Components Analysis
