Factoring out prior knowledge from low-dimensional embeddings

Edith Heiter; Jonas Fischer; Jilles Vreeken

arXiv:2103.01828·cs.LG·March 3, 2021

Factoring out prior knowledge from low-dimensional embeddings

Edith Heiter, Jonas Fischer, Jilles Vreeken

PDF

Open Access

TL;DR

This paper introduces two methods, JEDI and CONFETTI, to incorporate prior knowledge into low-dimensional embeddings like tSNE and UMAP, enhancing the discovery of meaningful data structures.

Contribution

The paper presents novel techniques for factoring out prior knowledge from embeddings, improving interpretability and revealing hidden structures in data.

Findings

01

Both methods effectively incorporate prior knowledge.

02

They produce embeddings with clearer, more meaningful structures.

03

Experiments validate their utility on synthetic and real data.

Abstract

Low-dimensional embedding techniques such as tSNE and UMAP allow visualizing high-dimensional data and therewith facilitate the discovery of interesting structure. Although they are widely used, they visualize data as is, rather than in light of the background knowledge we have about the data. What we already know, however, strongly determines what is novel and hence interesting. In this paper we propose two methods for factoring out prior knowledge in the form of distance matrices from low-dimensional embeddings. To factor out prior knowledge from tSNE embeddings, we propose JEDI that adapts the tSNE objective in a principled way using Jensen-Shannon divergence. To factor out prior knowledge from any downstream embedding approach, we propose CONFETTI, in which we directly operate on the input distance matrices. Extensive experiments on both synthetic and real world data show that both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · Domain Adaptation and Few-Shot Learning · Face and Expression Recognition